Music-Genre-Classificator

Different architectures to lassify music files based on genre from the GTZAN music corpus, namely:

Convolutional Neural Network (CNN)
Recurrant Neural Network (RNN)
Inception V3
MobileNet V2

(Implementated with Tensorflow)

Dataset

In the GTZAN music corpus there's 10 genres with 100 songs each (1000 in total): 80% of it was used during the training phase (800 images), and 20% for testing (200 images). After the split, each song of 30 seconds is split in chunks of 10 seconds (resulting in 2400 and 600 training and testing samples).

Dataset can be downlaoded here: http://marsyas.info/downloads/datasets.html

Audio augmentation

To increase further the amount of data, some augmentation were done on the audio files. For each song chunk, we applied:

Add light random noise in the wave form
Add intense random noise in the wave form
Increase randomly pitch (2% at most)

Audio features extracted

For exctracting the audio features, the library librosa was used.

Mel-frequency spectrogram as images of size 512x512 (in black and white)
Combination of Mel-frequency spectogram, spectral centroid and spectral contrast stacked as images of size 512x512

Example of Mel-frequency Spectrograms

Results

Model	Trainig accuracy	Test accuracy
MobileNet V2 (TL)	77%	77%
Inception V3(TL)	99%	84%
CNN	55%	62%
RNN	77%	66%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Music-Genre-Classificator

Dataset

Audio augmentation

Audio features extracted

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Music-Genre-Classificator

Dataset

Audio augmentation

Audio features extracted

Results