This project implements a semantic search system for a movie database using MongoDB Atlas and sentence transformers. It allows for both vector-based semantic search and traditional text-based search on movie plots and titles.
- Connect to MongoDB Atlas cluster
- Generate embeddings for movie plots using the 'all-MiniLM-L6-v2' model
- Create a vector search index in MongoDB
- Perform vector-based semantic search on movie plots
- Perform text-based search on movie titles and plots
- Compare results from both search methods
- Python 3.9+
- MongoDB Atlas account with a cluster set up
pymongo
librarysentence_transformers
library
-
Clone this repository:
git clone <repository-url> cd <repository-directory>
-
Install the required packages:
pip install pymongo sentence_transformers
-
Set up your MongoDB Atlas cluster and obtain the connection string.
-
Replace the
MONGO_URI
in the script with your MongoDB Atlas connection string.
Run the script with:
python semanticsearch.py
The script will:
- Connect to your MongoDB Atlas cluster
- Create a vector search index (if it doesn't exist)
- Perform searches using predefined queries
- Display results from both vector-based and text-based searches
- Modify the
queries
list in themain()
function to change the search queries. - Adjust the
limit
parameter invector_search_movies()
andtext_search_movies()
to change the number of results returned.
- The script assumes that movie plot embeddings have already been generated and stored in the database. If you need to generate embeddings, uncomment the
add_embeddings_to_movies()
function call inmain()
. - The vector search index creation is commented out by default. Uncomment the
create_vector_index()
function call inmain()
if you need to create the index.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.