Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with lags_seq with Weekly data input #83

Open
HannaHUp opened this issue Jun 18, 2024 · 8 comments
Open

Issue with lags_seq with Weekly data input #83

HannaHUp opened this issue Jun 18, 2024 · 8 comments

Comments

@HannaHUp
Copy link

Hi,
My data is weekly data. As you see here. So I set freq = "7D".
image

I think it makes sense to me if I set lags_seq = ["Q", "M", "W", "D"] in LagLlamaEstimator becuase I don't have second or hour or T data.

Now my module is :
create_lightning_module {'input_size': 1, 'context_length': 32, 'max_context_length': 2048, 'lags_seq': [0, 7, 8, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 34, 35, 36, 50, 51, 52, 55, 83, 102, 103, 104, 154, 155, 156, 362, 363, 364, 726, 727, 728, 1090, 1091, 1092], 'n_layer': 8, 'n_embd_per_head': 16, 'n_head': 9, 'scaling': 'robust', 'distr_output': gluonts.torch.distributions.studentT.StudentTOutput(), 'num_parallel_samples': 100, 'rope_scaling': None, 'time_feat': True, 'dropout': 0.0}

Total lags_seq is 42.

But I got this error:
RuntimeError: Error(s) in loading state_dict for LagLlamaLightningModule:
size mismatch for model.transformer.wte.weight: copying a param with shape torch.Size([144, 92]) from checkpoint, the shape in current model is torch.Size([144, 50]).
image

@HannaHUp
Copy link
Author

Also, I have a question:

when freq_str is "Q", offset is:<QuarterEnd: startingMonth=12>, offset.n= 1
lag_indices = [1, 8, 9, 11, 12, 13]
How do you explain the lag_indices ?

does it mean it will get past 1 and 8 and 9 and 11, 12,13 data of my first target data?
but my data is 7D freq. The last 1 means last week(7days ago), the last 8 means8*7 = 56 days ago.
56 days ago does not necessarily mean a quarter (three months) ago.

@AirswitchAsa
Copy link

AirswitchAsa commented Jun 20, 2024

using unchecked=True in PandasDataset.from_long_dataframe and leaving lags_seq unchanged solved my issue

@HannaHUp
Copy link
Author

using unchecked=True in PandasDataset.from_long_dataframe and leaving lags_seq unchanged solved my issue

Hi Thank you.
But my data is weekly data. Why does it need lags_seq: list = ["Q", "M", "W", "D", "H", "T", "S"]? Why do we provide "D", "H", "T", "S" when it is weekly data?

@AirswitchAsa
Copy link

I am guessing that lag-llama will omit the D, H, T, S automatically if your data frequency is weekly.

@HannaHUp
Copy link
Author

I am guessing that lag-llama will omit the D, H, T, S automatically if your data frequency is weekly.

I was thinking the same too. But I checked the code. when it is doing prediction_splitter, It will get self.context_length32 + max(self.lags_seq)1092 data.
The max(self.lags_seq) is from freq "D": lag_indices [1, 8, 13, 14, 15, 20, 21, 22, 27, 28, 29, 30, 31, 56, 84, 363, 364, 365, 727, 728, 729, 1091, 1092, 1093].
It uses D, H, T, S to generate lag_indices as well.
But mine is weekly data. Each data means a day in a week.

So I'm confused here

@ashok-arjun
Copy link
Contributor

Hi!

tldr: Irrespective of your frequency, lag llama uses the lags of all frequencies. So you should never change lags_seq.

So, lag-llama was trained with an initial linear layer mapping all lags (from all frequencies). These lag indices computed from all frequencies finally are these:
[0, 7, 8, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 34, 35, 36, 46, 47, 48, 50, 51, 52, 55, 57, 58, 59, 60, 61, 70, 71, 72, 83, 94, 95, 96, 102, 103, 104, 117, 118, 119, 120, 121, 142, 143, 144, 154, 155, 156, 166, 167, 168, 177, 178, 179, 180, 181, 334, 335, 336, 362, 363, 364, 502, 503, 504, 670, 671, 672, 718, 719, 720, 726, 727, 728, 1090, 1091, 1092]
So by changing lags_seq, you cannot use our released pretrained model, as it would throw the size mismatch error you show in the issue. We trained lag-llama to be used with any frequency (without specifying the frequency). So you should run inference for your data without changing lags_seq.

The other alternative is to train a model from scratch on your own data, with your specific frequencies. This is only possible if you have a large amount of data in your case. Or, you could re-train on the datasets we trained on, with just frequencies you care about for your downstream usecases.

@HannaHUp
Copy link
Author

Hi!

tldr: Irrespective of your frequency, lag llama uses the lags of all frequencies. So you should never change lags_seq.

So, lag-llama was trained with an initial linear layer mapping all lags (from all frequencies). These lag indices computed from all frequencies finally are these: [0, 7, 8, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 34, 35, 36, 46, 47, 48, 50, 51, 52, 55, 57, 58, 59, 60, 61, 70, 71, 72, 83, 94, 95, 96, 102, 103, 104, 117, 118, 119, 120, 121, 142, 143, 144, 154, 155, 156, 166, 167, 168, 177, 178, 179, 180, 181, 334, 335, 336, 362, 363, 364, 502, 503, 504, 670, 671, 672, 718, 719, 720, 726, 727, 728, 1090, 1091, 1092] So by changing lags_seq, you cannot use our released pretrained model, as it would throw the size mismatch error you show in the issue. We trained lag-llama to be used with any frequency (without specifying the frequency). So you should run inference for your data without changing lags_seq.

The other alternative is to train a model from scratch on your own data, with your specific frequencies. This is only possible if you have a large amount of data in your case. Or, you could re-train on the datasets we trained on, with just frequencies you care about for your downstream usecases.

Thank you so much.
So no matter what frequency of my data, it will use model.context_length+max(model.lags_seq) data which is 32+1092 = 1124 data point in history.
So if I have daily data, it will use past 1124 days of data to do prediction.
if I have weekly data, it will use past 1124 weeks of data to do prediction.
and inside the model, it will map all lags (from all frequencies).
Am I understanding it correctly?
Thank you

@ashok-arjun
Copy link
Contributor

Exactly. And note that you don't need 1124 points in your dataset. It uses it if it's available; else just uses nothing in its place, and can still forecast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants