Skip to content

Latest commit

 

History

History
44 lines (29 loc) · 2.04 KB

README.md

File metadata and controls

44 lines (29 loc) · 2.04 KB

Music-Genre-Classificator

Different architectures to lassify music files based on genre from the GTZAN music corpus, namely:

  • Convolutional Neural Network (CNN)
  • Recurrant Neural Network (RNN)
  • Inception V3
  • MobileNet V2

(Implementated with Tensorflow)

Dataset

In the GTZAN music corpus there's 10 genres with 100 songs each (1000 in total): 80% of it was used during the training phase (800 images), and 20% for testing (200 images). After the split, each song of 30 seconds is split in chunks of 10 seconds (resulting in 2400 and 600 training and testing samples).

Dataset can be downlaoded here: http://marsyas.info/downloads/datasets.html

Audio augmentation

To increase further the amount of data, some augmentation were done on the audio files. For each song chunk, we applied:

  • Add light random noise in the wave form
  • Add intense random noise in the wave form
  • Increase randomly pitch (2% at most)

Audio features extracted

For exctracting the audio features, the library librosa was used.

Example of Mel-frequency Spectrograms

Results

Model Trainig accuracy Test accuracy
MobileNet V2 (TL) 77% 77%
Inception V3(TL) 99% 84%
CNN 55% 62%
RNN 77% 66%