[QUESTION] What's the correct way to train XGBoost on a multi-source dataset? #2570

giacomoguiduzzi · 2024-10-24T07:46:29Z

Greetings,

I am trying to fit an XGBoost model using a dataset that is made of samplings from multiple data sources. Each data source is associated to an ID. The best way I found to represent this through TimeSeries instances is to stack the measurements on the sample axis using a common time axis. Giving the XGBoost model the resulting TimeSeries as-is works, but I don't want a probabilistic forecasting, I want a single-step one. In my data processing pipeline I created a sliding window dataset where each window is associated to a label at t timesteps in the future, so I tried creating a different TimeSeries object for each window; I get an exception about the size of the TimeSeries objects, and I think it is because these objects do not have the label in them (if I'm using a look-back window size of 96 for example, these TimeSeries objects are going to be of size 96, thus it raises an exception about arrays being 0-sized). Looking at the code for the XGBoost model, it looks like it works in a sort of auto-regressive mode: if I got it correctly, it is using the previous 96 values of the given TimeSeries to forecast what is coming immediately after that. Since I have a forecasting horizon n=1, does the model need TimeSeries objects of size 97? If this is correct, would the procedure of giving the model a sequence of TimeSeries objects work for training as it would for other models using the sliding window dataset? I am not sure the outcome is the same as passing a single dataset object.
I tried summing up the problem to avoid a wall of text, so let me know if you need any further information.
Thank you so much for your support. I'm looking forward to your kind response.

Best Regards,
Giacomo Guiduzzi

The text was updated successfully, but these errors were encountered:

giacomoguiduzzi added question Further information is requested triage Issue waiting for triaging labels Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] What's the correct way to train XGBoost on a multi-source dataset? #2570

[QUESTION] What's the correct way to train XGBoost on a multi-source dataset? #2570

giacomoguiduzzi commented Oct 24, 2024

[QUESTION] What's the correct way to train XGBoost on a multi-source dataset? #2570

[QUESTION] What's the correct way to train XGBoost on a multi-source dataset? #2570

Comments

giacomoguiduzzi commented Oct 24, 2024