Welcome to the Book Recommendation System! This project is designed to provide personalized book recommendations to users based on their reading preferences, behavior, and ratings. The system uses advanced algorithms like collaborative filtering and content-based filtering to suggest books that align with each user's unique tastes. This README will guide you through the project structure, dataset details, and how to run the system locally.
- Personalized Recommendations: Offers book suggestions tailored to individual users.
- Multiple Algorithms: Uses collaborative filtering, content-based filtering, and hybrid approaches.
- Scalable: Designed to handle large datasets efficiently.
The dataset used in this project consists of three primary components:
- Books Dataset: Contains details about the books such as title, author, genre, publication year, and book cover image.
- Users Dataset: Stores user information including user ID, demographics, and reading history.
- Ratings Dataset: Consists of user ratings for different books, which are crucial for generating recommendations.
- Author Ratings: Includes ratings based on the author's overall contribution, calculated by considering the number of books they have written and their average ratings.
Author Rating is an additional metric introduced in this system to evaluate an author's influence and consistency in producing quality content. This rating is derived by analyzing the number of books an author has written and the average ratings those books have received. This feature helps in providing more balanced recommendations by considering not only individual book ratings but also the overall credibility of the author.
The dataset was collected from Hugging face and includes:
- Books.csv: Contains columns like
BookID
,Title
,Author
,Genre
,Year
, andCoverImage
.
Before using the dataset, it underwent several preprocessing steps:
- Data Cleaning: Removed duplicates, handled missing values, and standardized formats.
- Normalization: Ratings were normalized to a consistent scale.
- Feature Engineering: Additional features such as
AverageRating
andGenreFrequency
were derived for enhanced recommendations.
Make sure you have the following installed on your machine:
- Python 3.8 or higher
- pip (Python package installer)
- git (version control system)
- Clone the Repository Clone the project repository to your local machine using git:
git clone https://github.com/Vaibhav-kesarwani/Books_Recommendation_System.git
cd Book_Recommendation_System
- Create a Virtual Environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install Dependencies Install the required Python packages using pip:
pip install -r requirements.txt
- Set Up the Dataset
Place the dataset files
books_data.csv
in theroot
directory.
To run the Book Recommendation System, follow these steps:
-
Prepare the Dataset: Ensure that your
books_data.csv
file is in the root directory. This file should contain the text data and corresponding books info labels, separated by a colon(:)
. -
Run the Script: Execute the main script to load the data and perform books recommendation:
python main.ipynb
- Output: The script will print the first few rows of the dataset to the console, showing the text samples and their associated book info labels.
The model training is performed within the main.ipynb
script, which processes the text data, tokenizes it, and trains a Plot the model using plotly. You can modify the model architecture, training parameters, or the data processing steps within this script.
def recommend_books(book_title, cosine_sim=cosine_sim):
# Get the index of the book that matches the title
idx = data[data['title'] == book_title].index[0]
# Get the cosine similarity scores for all the books with this book
sim_scores = list(enumerate(cosine_sim[idx]))
# Sort the books based on the similarity scores
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
# Get the top 10 most similar books (excluding the input book)
sim_scores = sim_scores[1:11]
# Get the book indices
book_indices = [i[0] for i in sim_scores]
# Return the top 10 most recommended books
return data['title'].iloc[book_indices]
After training the model, you can use it to predict books recommendation from new text inputs. Implement the prediction logic in a separate script or extend main.ipynb
to include a prediction function.
Here is an overview of the project directory structure:
Book_Recommendation_System/
│
├── venv # To store the python library files in the virtual env
├── .gitignore # Containg all the unwanted file venv file and etc.
├── book_data.csv # The dataset file containing Books name, rating and author name labels
├── main.ipynb # Jupyter notebooks for data exploration and analysis
├── requirements.txt # List of dependencies
├── LICENSE # Containg the license for the project
└── README.md # Project documentation (this file)
Contributions are welcome! If you'd like to contribute to this project, please follow these steps:
- Fork the repository & Star the repository
- Create a new branch (git checkout -b feature)
- Make your changes
- Commit your changes (git commit -am 'Add new feature')
- Push to the branch (git push origin feature)
- Create a new Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you have any questions or suggestions, feel free to reach out to me at :