-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsatisfactory Fine-tuning Results: input-independent and large deviation from real values #85
Comments
Thanks @simona-0 for the issue and following up with a fix. I'm not sure why this is happening, but the scaling does have something to do with it. By "manually standardising", do you mean globally computing a mean and standard deviation from the training set, and standardizing the training and test sets? And in this case, do you turn off standardization in the model? Also, on a related note: Are you using FP32 format for the model/data, or other quantized versions (like FP16/BF16)? |
Hi @ashok-arjun thank you for your speedy reply, by manually standardising I meant subtracting both the train and test datasets with the average of train dataset of all the trajectories across all time, and then dividing the difference by an average of the standard deviation calculated from each trajectories in the train dataset (keep test dataset untouched of course). When I did that, there is no standardisation happening within the model, cos I turned it off. The results are really satisfactory as shown in the last comment. I also tried with simply passing in BTW I am working on a thesis in fine-tuning and eventually extending lag-llama to multivariate TS forecasting, and I find this scaling issue really interesting. What I assume is that for lag-llama, the global scaling in data preprocessing tends to suit datasets with big magnitude and smaller standard deviation, compared to the window-scaling built-in in the model, would definitely appreciate your thought on this. |
@bannatyne84 Hi, I think we have the same issue here. A quick fix would be to first manually standardise your datasets before feeding them to the model, by doing so you reduce the magnitude and squeeze the deviation of the datasets acceptable for lag-llama, and of course you need to restore the model output using the mean and standard deviation from the initial standardisation step. The flag |
Hi, thank you for your contribution. I have been trying to fine-tune your model in a univariate time series forecasting task with C-MAPSS turbine datasets, the goal is to learn the trajectory pattern and thus predict the future trend. I started off with 140 trajectories as training sets (each corresponds to sensor_14 from C-MAPSS dataset, ranging from the beginning of the experiment till the failure in the end) and 60 trajectories as testing sets (also from sensor_14). Due to bad test results, I am now testing on the training datasets and trying to overfit the model on purpose, and see if the model would memorise any trajectory by heart. However, the model won't learn much from the training datasets, still have relatively high training loss (from ~8 to ~3).
Grid search on lr, batch_size and context_length was done, yet the best train loss the model could obtain (after some few thousands epochs) is still around 3, and if we look at the prediction as shown below, it is obvious that the model was a bit 'lazy' and only tried to find the average of the trajectories it saw during training to a point it's almost input-independent. The whole trajectory was visible to the model so it is not like the end is hidden from the model so it doesn't learn the deterioration pattern. Also the first prediction at time t+1 tends to have large deviation with true value.
Could this be related to the training process or the model itself? Would really appreciate your opinion on this.
The text was updated successfully, but these errors were encountered: