-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An error occurred when using the xgboost as a classifier for hiclass #122
Comments
If you want to use the softmax objective, you have to encode your label to the range [0, num_classes), which you can't do inside hiclass. |
Hi @RamSnoussi, Thank you for the interest in HiClass. As mentioned by @tcsmaster, you need to encode your labels. Here is an example of how to do it, but you have to be careful to call the method |
If not @RamSnoussi, then I would really appreciate the code snippet. |
The snippet I have is not well structured, but the algorithm goes like this: from sklearn.preprocessing import LabelEncoder
np_y = np.array(y) # convert y to a numpy array if it is not yet
flat_y = np.unique(np.append(np_y.flatten(), "hiclass::root")) # flatten and return all unique labels from the hierarchy
# encode labels in the hierarchy
label_encoder = LabelEncoder()
label_encoder.fit(flat_y)
y = np.array(
[label_encoder.transform(row) for row in np_y]
) Then you can train the hierarchical classifier with the encoded labels and decode the labels after prediction with the method The code is available in this branch if you want to take a further look |
Thank you for the code. Does this also mean that the model needs to encounter all available labels during training? |
Hi @mirand863, |
Hi @tcsmaster, Yes, the model needs to see as many labels as possible during training. Just be careful to not leak data in case you have to split between training/test data. We can also discuss this in private. Please, feel free to email me at [email protected] |
Hi @RamSnoussi, Can you please clarify what is the issue with the separator? I was able to execute this code without errors. |
Hi @mirand863, |
Is there a solution how to use xgboost with hiclass? |
Hi, Sorry for the delay. Yes, the separator needs to be removed in this case. In the branch I sent you it has been removed, but was not easy to see. Here is a full diff with changes: main...cuml. If I remember correctly, you just need to remove the multiple occurences of |
hi @mirand863,
what is the problem here? |
hi @mirand863, when I use an older version of xgboost like 0.90 it works successfully. |
Hi @RamSnoussi , It seems to me that your xgboost classifier expects the classes to start from 0 for each classifier. I guess you would need to use a label encoder for each local classifier, separately. Please see https://stackoverflow.com/questions/71996617/invalid-classes-inferred-from-unique-values-of-y-expected-0-1-2-3-4-5-got for reference. Good to know it works in an older version. Best regards, |
Hi,
Bellow it's my example when using the xgboost classifier for hiclass. My question is specifically directed to the hiClass Python package for hierarchical classification. I would like to model the problem using hierarchical classification approach to proceed like in figure below:
How can I correct this error?
The text was updated successfully, but these errors were encountered: