This project is a video search engine application with a graphical user interface (GUI) using MongoDB, Neo4j, and MySQL for data storage and retrieval. The application includes features such as video indexing, relationship management, user authentication, sentiment analysis for user comments, and personalized video recommendations.
Directory/File | Description |
---|---|
hate_speech_model.joblib |
Joblib file for the hate speech detection model. |
individual.json |
JSON file containing individual video data. |
main.py |
Main script for the video search engine application. |
preprocessed_data |
Directory containing preprocessed video data in JSON format. |
relations.json |
JSON file containing relationship data. |
sentiment_analysis_pipeline.joblib |
Joblib file for the sentiment analysis model pipeline. |
sentiment_analysis_training.py |
Script for training the sentiment analysis model. |
users.json |
JSON file containing user data. |
users_relation.json |
JSON file containing user relationship data. |
video_data |
Directory containing raw video data. |
Directory/File | Description |
---|---|
api.py |
Flask Blueprint for sentiment analysis API routes. |
auth.py |
Flask Blueprint for user authentication routes. |
creators.py |
Flask Blueprint for user analytics routes. |
mongodb_models.py |
Class for MongoDB interactions. |
mysql_models.py |
Data models for SQLAlchemy and MySQL. |
neo4j_models.py |
Class for Neo4j interactions and search recommendations. |
preprocessing.py |
Script for video data preprocessing. |
static |
Directory for static assets in the Flask application. |
templates |
Directory for HTML templates in the Flask application. |
upload_youtube_videos.py |
Class for uploading YouTube video data. |
views.py |
Flask Blueprint for general web application routes. |
Showcasing the aesthetically designed backend and landing page of the video search engine.
Highlighting the user-friendly sign-up and login pages for seamless authentication.
Illustrating the home screen featuring an intuitive search bar for easy video queries.
Providing a glimpse of the video page presenting the selected video, related suggestions, and interactive options like liking, subscribing, and adding comments.
Showcasing the functionality allowing users to upload favorite YouTube videos and creators to check sentiment analysis for their videos.
-
MongoDB:
- Indexing Video Files: Efficient indexing of video files for fast retrieval.
- Search Functionality: MongoDB used for search based on common words and relevant keywords. Jaro similarity-based scoring system employed to rank search results.
-
Neo4j:
- Managing Relationships: Nodes for 'USER,' 'CHANNEL,' and 'VIDEO.' Relationships established based on commonalities such as subscribed channels, liked videos, and similar content.
-
MySQL:
- Relational Information Storage: Storing click-through data and relational information, critical for improving search results and personalized recommendations.
-
Components:
- Search Query Panel (SQP): Allows users to input video search queries.
- Search Button (SB): Initiates the search based on the query.
- Search Result Panel (SRP): Displays a list of relevant videos.
- Current Video Panel (CVP): Shows detailed information about the selected video.
-
Workflow:
- Users input search queries in SQP and click SB to view relevant videos in SRP.
- Clicking on a video in SRP updates CVP with video details and refreshes SRP with related videos from Neo4j.
- Click-through information is stored in MySQL, capturing details about user interactions and preferences.
-
Authentication:
- User login, logout, and signup functionality.
- Secure password hashing.
-
Customization:
- Initial video suggestions based on common words.
- User behavior tracking for customization as users interact more.
- Sentiment analysis on user comments using a machine learning pipeline.
- Trained model utilized in the Flask API to provide sentiment scores for comments.
- Web scraping for extracting video data from YouTube.
- External APIs used for correcting English in search queries.
- User analytics for processing and analyzing user comments using a pre-trained machine learning model for hate speech detection.
- Collaborative filtering using Neo4j, creating relationships based on user interactions.
- Implemented a JSON-based caching mechanism to store user activity and metadata changes.
- Data, such as video metadata and user activity, cached in JSON files for faster retrieval and updating.
- Optimized data fetching from Neo4j, particularly when dealing with slow Python abstractions.
Dependency | Purpose |
---|---|
Python 3.x |
Programming language for the project. |
mongodb |
Database for storing video information and enabling fast queries. |
mysql |
Database for managing user-related data, login credentials, and user behavior tracking. |
neo4j |
Database for storing relationships between users, channels, and videos for the recommendation engine. |
- Install dependencies:
pip install -r requirements.txt
. - Configure database connections in respective files.
- Run the Flask application:
python main.py
.
- Akriti Gupta: Pre-final Year (B.Tech. Artificial Intelligence & Data Science)
- Sagnik Goswami: Pre-final Year (B.Tech. Artificial Intelligence & Data Science)
- Tanish Pagaria: Pre-final Year (B.Tech. Artificial Intelligence & Data Science)
(IIT Jodhpur Undergraduates)