Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between confmatrix dim and labels length #67

Closed
bagustris opened this issue Sep 6, 2023 · 6 comments
Closed

Mismatch between confmatrix dim and labels length #67

bagustris opened this issue Sep 6, 2023 · 6 comments

Comments

@bagustris
Copy link
Collaborator

This issue is related to #61.

If the number of given labels is more than the number of actual numbers, the confusion matrix will not be printed and the error reporter is generated.

ERROR reporter: mismatch between confmatrix dim (11) and labels length (12: ['boredom', 'neutral', 'happy', 'sad', 'angry', 'fear', 'disgust', 'surprise', 'excited', 'pleasure', 'pain', 'disapointed'])

So, automatically detecting labels is needed to avoid this error. The labels are still can be used if the user wants to evaluate specific labels instead of all labels (must be less than number of actual labels).

Possible workaround:

  • build automatic detect labels as raised in automatically detect labels and bins #61
  • check if the number of given labels more than actual number --> give warning but proceed with actual (automatic) labels
  • still print/plot confusing matrix --> but give warning that actual (automatic) labels were used instead of given labels

An example is using ASVP-ESD dataset (exp.ini) and TESS.

@felixbur
Copy link
Owner

couldn't replicate yet

@felixbur
Copy link
Owner

but tess performs nicely as a training for emoodb
image

@bagustris
Copy link
Collaborator Author

This error actually happens if we mix the filter (i.e., labels) with other filters, e.g,

check_size = 1000
min_duration_of_samples = 2

Will close this issue since no further action is required.

@bagustris
Copy link
Collaborator Author

bagustris commented Apr 26, 2024

I am re-opensing this issue.
Simple test to reproduce,

python -m nkululeko.nkululeko --config tests/exp_emodb_audmodel_xgb.ini

Or,

python -m nkululeko.nkululeko --config tests/exp_emodb_os_knn.ini

Output

DEBUG nkululeko: running exp_emodb_audmodel from config tests/exp_emodb_audmodel_xgb.ini, nkululeko version 0.83.0
DEBUG experiment: value for type not found, using default: audformat
DEBUG dataset: emodb: loading from ./data/emodb/emodb
DEBUG dataset: emodb: loading tables: []
DEBUG dataset: value for files_tables not found, using default: ['files']
DEBUG dataset: value for label not found, using default: emotion
DEBUG dataset: value for target_tables not found, using default: ['emotion']
DEBUG dataset: emodb: loaded with 535 samples: got targets: True, got speakers: True (10), got sexes: True
DEBUG datafilter: emodb: limited samples to 50, reduced samples from 535 to 50
DEBUG experiment: target: emotion
DEBUG experiment: Target labels (from config): ['angry', 'happy', 'neutral', 'sad']
DEBUG experiment: loaded databases emodb
DEBUG dataset: splitting database emodb with strategy random
DEBUG dataset: value for test_size not found, using default: 20
DEBUG dataset: emodb: [40/10] samples in train/test
DEBUG dataset: emodb: 10 samples in test and 40 samples in train
DEBUG dataset: emodb: mapped {'anger': 'angry', 'happiness': 'happy', 'sadness': 'sad', 'neutral': 'neutral'}
DEBUG dataset: emodb: mapped {'anger': 'angry', 'happiness': 'happy', 'sadness': 'sad', 'neutral': 'neutral'}
DEBUG experiment: value for type not found, using default: dummy
DEBUG experiment: Categories test (nd.array): ['neutral' 'happy']
DEBUG experiment: Categories train (nd.array): ['neutral' 'angry' 'sad' 'happy']
DEBUG experiment: 4 speakers in test and 7 speakers in train
DEBUG nkululeko: train shape : (25, 8), test shape:(6, 8)
DEBUG featureset: extracting audmodel embeddings, this might take a while...
DEBUG featureset: value for aud.model not found, using default: ./audmodel/
DEBUG featureset: value for device not found, using default: cuda
DEBUG experiment: All features: train shape : (25, 1024), test shape:(6, 1024)                      
DEBUG experiment: scaler: standard
DEBUG scaler: scaling features based on training set
DEBUG runmanager: run 0
DEBUG modelrunner: run: 0 epoch: 0: result: test: 0.400 UAR
DEBUG modelrunner: plotting confusion matrix to emodb_xgb_audmodel_scale-standard_0_000_cnf
ERROR reporter: mismatch between confmatrix dim (3) and labels length (4: ['angry', 'happy', 'neutral', 'sad'])

See my comment above. Maybe it is related to limitation config in database (e.g., db.limit _samples).

@bagustris bagustris reopened this Apr 26, 2024
@bagustris
Copy link
Collaborator Author

My bad,

This happens when using the same name (output dir) for different experiments (e.g. in test files).

@felixbur
Copy link
Owner

sure, it's because there are labels in the train that are not in the test set.
But i think you got a point, this should not be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants