Skip to content

Latest commit

 

History

History
84 lines (56 loc) · 5.25 KB

README.md

File metadata and controls

84 lines (56 loc) · 5.25 KB

Speech Emotion Recognition

This is a Speech Emotion Recognition based on RAVDESS dataset, project repository for summer 2021, Brain and Cognitive Science Society.

Clone repo using:

git clone https://github.com/Aaka3021/Speech-Emotion-Recognition--1.git

Abstract:

Speech Emotion Recognition, abbreviated as SER, is the act of attempting to recognize human emotion and the associated affective states from speech. This is capitalizing on the fact that voice often reflects underlying emotion through tone and pitch. Emotion recognition is a rapidly growing research domain in recent years. Unlike humans, machines lack the abilities to perceive and show emotions. But human-computer interaction can be improved by implementing automated emotion recognition, thereby reducing the need of human intervention.

In this project, basic emotions like calm, happy, fearful, disgust etc. are analyzed from emotional speech signals. We use machine learning techniques like Multilayer perceptron Classifier (MLP Classifier) which is used to categorize the given data into respective groups which are non linearly separated. We will also use CNN (Convolutional Neural Networks) and RNN-LSTM model. Mel-frequency cepstrum coefficients (MFCC), chroma and mel features are extracted from the speech signals and used to train the MLP classifier. For achieving this objective, we use python libraries like Librosa, sklearn, pyaudio, numpy and soundfile to analyze the speech modulations and recognize the emotion.

Using RAVDESS dataset which contains around 1500 audio file inputs from 24 different actors (12 male and 12 female) who recorded short audios in 8 different emotions, we will train a NLP- based model which will be able to detect among the 8 basic emotions as well as the gender of the speaker i.e. Male voice or Female voice.
After training we can deploy this model for predicting with live voices.

Deliverables:

Learn the basics of Python, ML/DL, NLP, librosa, sklearn, etc , Literature Review , analyzing the dataset and Feature extraction. Building and training the model on the training data, followed by testing on test data. And finally, testing the model on live audio input (unseen) and collecting the results:)

Schedule:

Week1:

  • covering ml\dl basics

Week 2:

  • plotting waveform and spectrogram

  • learning audio preprocessing for feature extraction

Week 3:

  • Implementing the code for feature extraction using Librosa library

Week 4:

  • Implement the MLP model for emotion recognition
  • Evaluating it on test set

Week 5:

  • Implementing LSTM model
  • Starting to implement CNN model

Week 6:

  • Complete the CNN model implementation.
  • Model will be evaluated on our voice

Results:

  • CNN model gave an accuracy of 71%
  • LSTM model gave an accuracy of 66%
  • MLP model gave an accuracy of 62%

References: