Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] What's the correct way to train XGBoost on a multi-source dataset? #2570

Open
giacomoguiduzzi opened this issue Oct 24, 2024 · 0 comments
Labels
question Further information is requested triage Issue waiting for triaging

Comments

@giacomoguiduzzi
Copy link

Greetings,

I am trying to fit an XGBoost model using a dataset that is made of samplings from multiple data sources. Each data source is associated to an ID. The best way I found to represent this through TimeSeries instances is to stack the measurements on the sample axis using a common time axis. Giving the XGBoost model the resulting TimeSeries as-is works, but I don't want a probabilistic forecasting, I want a single-step one. In my data processing pipeline I created a sliding window dataset where each window is associated to a label at t timesteps in the future, so I tried creating a different TimeSeries object for each window; I get an exception about the size of the TimeSeries objects, and I think it is because these objects do not have the label in them (if I'm using a look-back window size of 96 for example, these TimeSeries objects are going to be of size 96, thus it raises an exception about arrays being 0-sized). Looking at the code for the XGBoost model, it looks like it works in a sort of auto-regressive mode: if I got it correctly, it is using the previous 96 values of the given TimeSeries to forecast what is coming immediately after that. Since I have a forecasting horizon n=1, does the model need TimeSeries objects of size 97? If this is correct, would the procedure of giving the model a sequence of TimeSeries objects work for training as it would for other models using the sliding window dataset? I am not sure the outcome is the same as passing a single dataset object.
I tried summing up the problem to avoid a wall of text, so let me know if you need any further information.
Thank you so much for your support. I'm looking forward to your kind response.

Best Regards,
Giacomo Guiduzzi

@giacomoguiduzzi giacomoguiduzzi added question Further information is requested triage Issue waiting for triaging labels Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triage Issue waiting for triaging
Projects
None yet
Development

No branches or pull requests

1 participant