Experiments setup

This step outlines the experimental design and organization for the development of Nexus model: a forecasting model using foundation models, proposing a knowledge distillation approach to achieve robust forecasts. The experiments aim to explore various aspects of model training, tuning, and inference to optimize the forecasting performance. This section explains the selection of teacher models, student models, calibration strategies, and which baseline models will be used to compare a generated forecast data. It also addresses key issues to be resolved before the experimentation process starts, including data selection, handling missing values, and the incorporation of exogenous variables related to Santos oceanic variables.

Should we use data only from Praticagem or also from other regions?
Some models do not accept missing data, and some series have missing data. Should we use interpolation or zero filling techniques?
Should we use SOFS or other data from numerical simulations as endo/exogenous variables for input to the models?
- Problem: Granularity of 3h. To deal with this in models that do not support series with different granularities, we would have to:
  - Interpolate to have measurements with a granularity of 1h or
  - Decide to run the models with a granularity of 3h to be able to use the SOFS as a baseline.
Actually, the most used foundation models are not the best choice for a good performance for long-term prediction, as they are specialized in short-term prediction, given the nature of the diversity of the data used to train the models. In the TimeGPT paper, they mention empirically considering that long-term predictions are those with more than 24 points.
- Problem: If we deal with granularity less than 1h, to predict a day in the future, we would need predictions greater than 24 points. Of the mentioned foundation model implementations, only TimeGPT presents an approach that uses pre-trained weights for long-term prediction, but they mention that this model is still incipient and was trained with a lower variety of data compared to the model for short-term prediction (<=24 prediction points).
Should we use estimates of carbon production for training/inference of the models?
- Problem: Learning curve x time for project delivery.
Should we use other variables from satellites/images?
- Problem: The foundation models chosen, by definition, do not work with encoded data. It would be necessary to alter the model to handle this data. In this case, the challenge would be to retrain them with this data domain. These models use weights already calculated from other GenAI-based SoA models (e.g., T5, Llama, etc.).

Variables

Endogenous

Current: Praticagem and other regions
Sea Surface Height (SSH): Praticagem and other regions
Astronomical tide: Praticagem and other regions

Exogenous

Astronomical tide: Praticagem and other regions

Granularity

I believe the ideal granularity for our testing scenario would be a periodicity of 1 hour, due to the limitation of foundation models in handling long-term predictions.

Inference Window

As it involves zero-shot learning strategies, I believe we can investigate an optimal window that correlates the size of the window with the computational cost/tokens used for training/inference. For proof of concept inference using TimeGPT, an incremental windowing strategy was used with padding = 1, where it consists of N predictions, where N is the number of measurement points in the test series. For elucidation:

1st prediction step:
- Inferred data: N measurement points of the training series minus the last 23 points of the same series.
- Predicted data: 24 points to be validated with the last 23 points of training plus the 1st measured point of the test series.
2nd prediction step:
- Inferred data: N measurement points of the training series minus the last 22 points of the same series.
- Predicted data: 24 points to be validated with the last 22 points of training plus the first 2 measured points of the test series.
24th prediction step:
- Inferred data: Total of N measurement points of the training series.
- Predicted data: 24 points to be validated with the first 24 points measured of the test series.
N-th prediction step:
- Inferred data: Total of N measurement points of the training series plus N - 24 points of the test series.
- Predicted data: 24 points to be validated with the last 24 points of the test series.

Experiment Organization

The experiments will be divided into 5 stages with objectives:

Teacher models (pre-trained models)
Student model (To be trained model)
Calibration model
Ablation Study
Validation/Optimization (Running complete pipeline)

Proposed Initial Model Diagram

The diagram below exemplifies an initial sketch of the proposed model with the aim of understanding the purpose of each element.

Tables

1st Stage of Experiments (Teacher models)

Models	Mode	Exec Env	Regions	Endog Vars	Exog Vars	#Fine-Tuning steps	#Epochs
TimeGPT	Multivariate	Nixtla API	Praticagem	curr, ssh, at	at	N/A	N/A
Chronos	Univariate	Xeon + TitanV	Praticagem	curr	at	N/A	N/A
N-HITS	Multivariate	Xeon + TitanV	Praticagem	curr, ssh, at	at	N/A	200

2nd Stage of Experiments (Student Model) TBD

Models	Mode	Exec Env	Endog Vars	Distillation loss	Exog Vars	#FT steps	#Epochs
Residual MLP	Multivariate	Xeon + TitanV	curr, ssh, at	TimeGPT(ŷ), Chronos(ŷ), N-HITS(ŷ)	at	N/A	200

3rd Stage of tunning Student model Calibration (Baseline)

Models	Plataform	Exec Env	Main Hyperparameters	Others HP	#Fine-Tuning steps	Parallel?
Residual MLP	Optuna	Xeon + TitanV	Beta (Distil. Balancer)	TBD	TBD	Yes

4th Stage of Experiments (Baseline)

Models	Mode	Exec Env	Regions	Endog Vars	Exog Vars	#Fine-Tuning steps	#Epochs
GNN OMAE	Multivariate	Xeon + TitanV	Praticagem	curr, ssh, at	at	100 (optuna)	200
SARIMAX	Univariate	Xeon + TitanV	Praticagem	curr, ssh, at	at	100 (optuna)	N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments.md

experiments.md

Experiments setup

Table of Contents

Models

Teacher models

Student model

Calibration model

Storytelling Baseline models

Issues to be Addressed:

Variables

Endogenous

Exogenous

Granularity

Inference Window

Experiment Organization

Proposed Initial Model Diagram

Tables

1st Stage of Experiments (Teacher models)

2nd Stage of Experiments (Student Model) TBD

3rd Stage of tunning Student model Calibration (Baseline)

4th Stage of Experiments (Baseline)

Files

experiments.md

Latest commit

History

experiments.md

File metadata and controls

Experiments setup

Table of Contents

Models

Teacher models

Student model

Calibration model

Storytelling Baseline models

Issues to be Addressed:

Variables

Endogenous

Exogenous

Granularity

Inference Window

Experiment Organization

Proposed Initial Model Diagram

Tables

1st Stage of Experiments (Teacher models)

2nd Stage of Experiments (Student Model) TBD

3rd Stage of tunning Student model Calibration (Baseline)

4th Stage of Experiments (Baseline)