We propose a model that leverages the state of the art in foundation models trained on massive datasets, along with another simpler model trained on a smaller dataset specific to the domain of the time series to be forecasted. This combination aims to bring out the best of both worlds: increasing prediction accuracy compared to individual model results, as well as reducing computational costs by avoiding expensive training and fine-tuning routines.
The research problem revolves around the high computational costs associated with extensive training routines and fine-tuning of foundation models. These models, trained on massive datasets, offer state-of-the-art performance but require significant computational resources and time for training and optimization. The challenge aims to finding efficient methods to leverage the benefits of these foundation models while mitigating the computational complexity, especially in domains such as time series forecasting where computational efficiency is crucial for real-world applications.
-
Zhou, K., He, Y., Cai, J., & Han, J. (2023). Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track.
-
Sun, H., Liu, Y., Wang, Z., Jiang, C., & Han, J. (2023). DIME-FM: Distilling Multimodal and Efficient Foundation Models. In International Conference on Computer Vision 2023.
-
Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the Knowledge in a Neural Network. In Advances in Neural Information Processing Systems (NIPS).
-
Liu, J., Yang, B., Wang, C., & Yang, Y. (Preprint). Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection.
- Distillations:
- TimeGPT without tuning + residualMLP
- Chronos without tuning + residualMLP
- (TimeGPT + Chronos) without tuning + residualMLP
- How to fine-tune the Chronos model?
- Canonical correlation analysis (Similarity from model's internal representation)
- Feature Distillation: Create a distillation related to internal know-how (Initializing Models with Larger Ones).