You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to load the eurlex_train.txt and eurlex_test.txt.
As far as I understood they are in the LibSVM format for multilabel classification.
Using the sklearn.datasets.load_svmlight_file fails though.
I've observed that in the eurlex_train.txt file, there are 28 rows holding no label, where the newline starts with a space.
Despite this, the training using the Rust CLI (and the python wrapper too) works straight.
I've observed that a check for the presence of labels in the line are present in the omikuji/src/data.rs by the parse_xc_repo_data_line function.
Since it seems I cannot rely on the very good sklearn.datasets.load_svmlight_file, what label should I assign to those rows?
In a first simple implementation I decided to skip missing-label rows.
The text was updated successfully, but these errors were encountered:
I was trying to load the
eurlex_train.txt
andeurlex_test.txt
.As far as I understood they are in the LibSVM format for multilabel classification.
Using the
sklearn.datasets.load_svmlight_file
fails though.I've observed that in the
eurlex_train.txt
file, there are 28 rows holding no label, where the newline starts with a space.If you run the following command
it results in 28 rows with the following line numbers in the
eurlex_train.txt
where the labels are missing:Despite this, the training using the Rust CLI (and the python wrapper too) works straight.
I've observed that a check for the presence of labels in the line are present in the
omikuji/src/data.rs
by theparse_xc_repo_data_line
function.Since it seems I cannot rely on the very good
sklearn.datasets.load_svmlight_file
, what label should I assign to those rows?In a first simple implementation I decided to skip missing-label rows.
The text was updated successfully, but these errors were encountered: