[TabularNLPAutoML] Add the ability to pass text features directly to CatBoost #141

EmotionEngineer · 2023-11-06T13:05:21Z

🐛 Bug

Comparing notebooks using text features, LAMA / CatBoost I get a significantly higher test RMSE using LAMA
Tried everything, in LAMA leave only CatBoost, adjust CB params manually. Maybe something wrong with my LAMA implementation?

To Reproduce

CatBoost Notebook
LAMA Notebook

Expected behavior

Comparable accuracy to CatBoost when using LightAutoML

EmotionEngineer · 2023-11-10T16:43:32Z

I've identified the issue to be related to CatBoost receiving embedding-encoded numeric values from LightAutoML instead of direct text features. In my case, utilizing the 'text_features' directly in CatBoost yields better results compared to using embeddings or TF-IDF from LightAutoML.

I suggest enhancing the functionality of the 'text_features' parameter in CatBoost by adding an option for 'direct', allowing users to leverage CatBoost's built-in text processing functions for improved performance.

EmotionEngineer added the bug Something isn't working label Nov 6, 2023

EmotionEngineer changed the title ~~Low accuracy when using text features~~ [TabularNLPAutoML] Add the ability to pass text features directly to CatBoost Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TabularNLPAutoML] Add the ability to pass text features directly to CatBoost #141

[TabularNLPAutoML] Add the ability to pass text features directly to CatBoost #141

EmotionEngineer commented Nov 6, 2023 •

edited

Loading

EmotionEngineer commented Nov 10, 2023 •

edited

Loading

[TabularNLPAutoML] Add the ability to pass text features directly to CatBoost #141

[TabularNLPAutoML] Add the ability to pass text features directly to CatBoost #141

Comments

EmotionEngineer commented Nov 6, 2023 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

EmotionEngineer commented Nov 10, 2023 • edited Loading

EmotionEngineer commented Nov 6, 2023 •

edited

Loading

EmotionEngineer commented Nov 10, 2023 •

edited

Loading