This project implements an end-to-end pipeline for detecting voice activity, reducing noise, normalizing audio, and classifying the speaker's gender using pre-trained models.
The pipeline integrates the following tasks:
- Voice Activity Detection (VAD): Detects segments in the audio where speech is present using the Silero VAD model.
- Noise Reduction: Applies noise reduction techniques to enhance the audio quality.
- Audio Normalization: Normalizes the audio for consistent volume levels.
- Gender Classification: Classifies the detected voice as either male or female using a pre-trained Wav2Vec2 model.
- Voice Detection: Automatically identifies whether an audio file contains speech or not.
- Noise Reduction: Improves audio clarity by reducing background noise.
- Audio Normalization: Ensures the audio is at a consistent volume for accurate processing.
- Gender Classification: Predicts the gender of the speaker with high accuracy using a fine-tuned speech model.
- Load an audio file (WAV format).
- Detect speech segments using Silero VAD.
- If speech is detected:
- Apply noise reduction.
- Normalize the audio.
- Classify the gender of the speaker.
- Output the predicted gender along with any detected voice segments.
- Silero VAD: A pre-trained model for voice activity detection.
- Wav2Vec2: A fine-tuned pre-trained model for gender classification based on speech.
The link to the pre-trained model is included in the documentation
The dataset used for training and testing the gender classification model can be downloaded from this link.
- Speech Processing: Use this pipeline for tasks involving speech detection and speaker identification.
- Audio Preprocessing: Clean and normalize audio before further analysis or modeling.
- Gender Analytics: Gain insights into the gender of speakers in audio datasets.
- Python 3.x
- Torch
- Transformers
- Torchaudio
- Noisereduce