Skip to content

Latest commit

 

History

History
50 lines (29 loc) · 3.46 KB

Readme.md

File metadata and controls

50 lines (29 loc) · 3.46 KB

Arabic Word Audio Classifier

This project is an audio classifier that utilizes a Long Short-Term Memory (LSTM) model to predict isolated Arabic words from 0 to 9. It incorporates audio augmentation techniques to improve the model's performance and robustness. The goal is to accurately identify the spoken Arabic words based on their audio representations.

Dataset

The dataset used for this project consists of a collection of audio recordings of isolated Arabic words ranging from 0 to 9. Each audio sample is labeled with its corresponding word.

Before proceeding with audio augmentation, we perform a data analysis step using the data_analysis.ipynb notebook. This analysis includes:

  • Determining the total number of audio files in the entire folder.
  • Calculating the average number of audio files in the entire folder.

Audio Augmentation

To enhance the model's ability to generalize and handle various audio conditions, audio augmentation techniques are applied to the training dataset. These techniques introduce synthetic variations to the original audio samples, such as pitch shifting and background noise addition. By augmenting the training data, the model becomes more robust and less prone to overfitting.

After the data analysis and audio augmentation steps, we proceed to the next stage.

Preprocessing

In the LSTM.ipynb notebook, we preprocess the augmented audio data for input into the LSTM model. The preprocessing steps may include:

  • Feature extraction: Extracting relevant features from the audio signals, such as Mel-frequency cepstral coefficients (MFCCs).
  • Normalization: Scaling the audio features to a common range to ensure consistent input to the model.

LSTM Model

Once the audio data is preprocessed, we train and evaluate the LSTM model using the LSTM.ipynb notebook too. The model architecture and training process are defined in this notebook. The steps involved in the LSTM model implementation may include:

  1. Model Configuration: Setting up the LSTM model with appropriate hyperparameters, such as the number of LSTM units, learning rate, and regularization techniques.
  2. Training: Training the LSTM model using the preprocessed audio data, including the augmented samples.
  3. Model Evaluation: Evaluating the trained model on a testing set to assess its accuracy and generalization capabilities.

Evaluation Metrics

The performance of the audio classifier is measured using various evaluation metrics, including:

  1. Accuracy: The overall accuracy of the model in correctly classifying the Arabic words.
  2. Precision: The ability of the model to avoid false positives.
  3. Recall: The ability of the model to detect true positives.
  4. F1 Score: The harmonic mean of precision and recall, providing a balanced evaluation metric.

Deployment

Once the LSTM model is trained and evaluated, it can be deployed in various applications to classify spoken Arabic words. This could involve integrating the model into a mobile app, a voice-controlled system, or any other application that requires Arabic word recognition based on audio input.

Conclusion

The audio classifier developed in this project demonstrates the effectiveness of LSTM models in classifying isolated Arabic words. By leveraging audio augmentation techniques and proper preprocessing, the model achieves improved accuracy and robustness. The trained model can be deployed in real-world applications to enable accurate and efficient recognition of Arabic words from 0 to 9 based on audio inputs.