You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Comparing notebooks using text features, LAMA / CatBoost I get a significantly higher test RMSE using LAMA
Tried everything, in LAMA leave only CatBoost, adjust CB params manually. Maybe something wrong with my LAMA implementation?
EmotionEngineer
changed the title
Low accuracy when using text features
[TabularNLPAutoML] Add the ability to pass text features directly to CatBoost
Nov 9, 2023
I've identified the issue to be related to CatBoost receiving embedding-encoded numeric values from LightAutoML instead of direct text features. In my case, utilizing the 'text_features' directly in CatBoost yields better results compared to using embeddings or TF-IDF from LightAutoML.
I suggest enhancing the functionality of the 'text_features' parameter in CatBoost by adding an option for 'direct', allowing users to leverage CatBoost's built-in text processing functions for improved performance.
🐛 Bug
Comparing notebooks using text features, LAMA / CatBoost I get a significantly higher test RMSE using LAMA
Tried everything, in LAMA leave only CatBoost, adjust CB params manually. Maybe something wrong with my LAMA implementation?
To Reproduce
CatBoost Notebook
LAMA Notebook
Expected behavior
Comparable accuracy to CatBoost when using LightAutoML
The text was updated successfully, but these errors were encountered: