This project demonstrates how to build a YouTube video summarization pipeline using Indexify. The pipeline downloads a YouTube video, extracts audio, transcribes it, classifies the content, and generates a summary based on the classification.
- YouTube video download
- Audio extraction from video
- Speech-to-text transcription using Faster Whisper
- Content classification using Llama.cpp (e.g., job interview, sales call)
- Generate summaries based on conversation type
- Python 3.9+
- Docker and Docker Compose (for containerized setup)
-
Clone this repository:
git clone https://github.com/tensorlakeai/indexify cd indexify/examples/video_summarization
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the main script:
python workflow.py --mode in-process-run
-
Clone this repository:
git clone https://github.com/tensorlakeai/indexify cd indexify/examples/video_summarization
-
Ensure Docker and Docker Compose are installed on your system.
-
Build the Docker images:
indexify-cli build-image workflow.py download_youtube_video indexify-cli build-image workflow.py extract_audio_from_video indexify-cli build-image workflow.py transcribe_audio indexify-cli build-image workflow.py classify_meeting_intent indexify-cli build-image workflow.py summarize_job_interview indexify-cli build-image workflow.py summarize_sales_call
-
Start the services:
docker-compose up --build
-
Deploy the graph:
python workflow.py --mode remote-deploy
-
Run the workflow:
python workflow.py --mode remote-run
-
Video Processing:
- Video Download: Uses
pytubefix
to download the YouTube video. - Audio Extraction: Extracts audio from the video using
pydub
. - Transcription: Converts speech to text using Faster Whisper.
- Video Download: Uses
-
Content Analysis:
- Classification: Uses Llama.cpp to classify the content of the transcription.
- Summarization: Generates a summary based on the classification (job interview or sales call) using Llama.cpp.
The project uses the following Indexify graph:
download_youtube_video -> extract_audio_from_video -> transcribe_audio -> classify_meeting_intent -> route_transcription_to_summarizer -> summarize_job_interview
-> summarize_sales_call
- Modify the
youtube_url
in therun_workflow()
function to process different videos. - Adjust the classification logic in
classify_meeting_intent()
to handle more content types. - Fine-tune the prompts in the summarization functions for better results.
- Experiment with different Llama model variants for improved performance.