-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Overfitting #39
Comments
Hey there. 20600 specs should be enough to get you decent results. My two cents would be to: 1) check the language distribution in your datasets, is there a language that is unreasonably higher than the rest?, 2) Is your test data from the same 20600? If it is not, then it's possible that the test is too different from your training data and you might need to try some augmentation techniques to accommodate the uniqueness of your test set. |
The dataset preparation code of this project guarantied equal amount of specs for each language. I am do not normalize audio beforehand (authors too) however. I am just run wav_to_spectrogram.py, create_csv.py, train.py with default config.yaml topcoder_crnn_finetune model. Later I am also try inceptionv3_crnn.py model. I am try:
MTDEX dataset can be found here https://www.openslr.org/100/ Results: Voxforge dataset - 5 language with 10020 - topcoder_crnn_finetune - Overfitting MTEDX dataset - 4 language with 21000 - topcoder_crnn_finetune - Overfitting MTEDX dataset - 3 language with 43000 - topcoder_crnn_finetune - Overfitting MTEDX dataset - 3 language with 43000 - inceptionv3_crnn - Just very poor result Here interesting experiment - I am comment layer.trainable=False line in model. Result is much better but still bad MTEDX dataset - 3 language with 43000 - topcoder_crnn_finetune - Overfitting No luck. P.S. Model conversion code sample is below and this is very simple: Original:
My code for Keras2 and TF1.14
|
I am try 4 different datasets. The biggest one contains 4 languages with 20600 pngs with 10 second spectrogramm for each language.
No luck. Train accuracy is 0.97 Validation and Test accuracy is 0.2 - 0.4. What dataset size I am must use?
P.S. I am use you default config. I am changed code (a little) to use Keras2 and Tensorflow 1.14.
The text was updated successfully, but these errors were encountered: