-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[time series] Add PatchTST #25927
[time series] Add PatchTST #25927
Conversation
Co-authored-by: Phanwadee Sinthong <[email protected]> Co-authored-by: Nam Nguyen <[email protected]> Co-authored-by: Vijay Ekambaram <[email protected]> Co-authored-by: Ngoc Diep Do <[email protected]> Co-authored-by: Wesley Gifford <[email protected]>
Co-authored-by: Phanwadee Sinthong <[email protected]> Co-authored-by: Nam Nguyen <[email protected]> Co-authored-by: Vijay Ekambaram <[email protected]> Co-authored-by: Ngoc Diep Do <[email protected]> Co-authored-by: Wesley Gifford <[email protected]>
…into add-patchtst
@amyeroberts the failing tests are from the |
@amyeroberts all the |
@amyeroberts I have made the doc fixes here #27476 |
The model was merged before final review and approval. This reverts commit 2ac5b93.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this!
The commit to main was reverted as PR was merged without a final review. I've left a review here. The PR is close to ready - there's a few outstanding pieces to resolved before approval.
There's quite a few comments re the config arguments - these are just for clarification for the user and consistency with other models in the repo.
The model docstrings and modelling page could use with clarification on the different use cases for PatchTSTForRegression
, PatchTSTForClassification
and PatchTSTForPredicition
as it's not immediately clear at the moment. Snippets of usage in the modeling page should be added and would help with this immensely.
There were a few outstanding comments in the PR which weren't resolved. Please make sure to either apply suggested changes or comment in the conversation why you don't think it should be applied before marking as resolved.
Main comment is about removing the positional_encoding
function and making sure all weight initialization is done in the PretrainedModel
class.
# PatchTST | ||
self.patch_length = patch_length | ||
self.stride = stride | ||
self.num_patches = self._num_patches() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't resolved
unmasked_channel_indices=self.unmasked_channel_indices, | ||
channel_consistent_masking=self.channel_consistent_masking, | ||
mask_value=self.mask_value, | ||
seed_number=self.seed_number, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't addressed
|
||
output_hidden_states = ( | ||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't resolved
- 0 for values that are **missing** (i.e. NaNs that were replaced by zeros). | ||
output_hidden_states (`bool`, *optional*): | ||
Whether or not to return the hidden states of all layers | ||
return_dict (`bool`, *optional*): Whether or not to return a `ModelOutput` instead of a plain tuple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing output_attentions here
[`~PreTrainedModel.from_pretrained`] method to load the model weights. | ||
""" | ||
|
||
PATCHTST_INPUTS_DOCSTRING = r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used
attention_dropout (`float`, *optional*, defaults to 0.0): | ||
The dropout probability for the attention probabilities. | ||
dropout (`float`, *optional*, defaults to 0.0): | ||
The dropout probability for all fully connected layers in the encoder, and decoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a decoder?
shared_projection (`bool`, *optional*, defaults to `True`): | ||
Sharing the projection layer across different channels in the forecast head. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code should use the imperative form
shared_projection (`bool`, *optional*, defaults to `True`): | |
Sharing the projection layer across different channels in the forecast head. | |
share_projection (`bool`, *optional*, defaults to `True`): | |
Whether or not to share the projection layer across different channels in the forecast head. |
Masking type. Only `"random"` and `"forecast"` are currently supported. | ||
random_mask_ratio (`float`, *optional*, defaults to 0.5): | ||
Masking ratio is applied to mask the input data during random pretraining. | ||
forecast_mask_patches (`List`, *optional*, defaults to `[2, 3]`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise reads as: should I forecast the mask patches?
forecast_mask_patches (`List`, *optional*, defaults to `[2, 3]`): | |
forecast_mask_patch_lengths (`List`, *optional*, defaults to `[2, 3]`): |
forecast_mask_ratios (`List`, *optional*, defaults to `[1, 1]`): | ||
List of weights to use for each patch length. For Ex. if patch_lengths is [5,4] and mix_ratio is [1,1], | ||
then equal weights to both patch lengths. Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it default to None or [1, 1]
?
forecast_mask_ratios (`List`, *optional*, defaults to `[1, 1]`): | ||
List of weights to use for each patch length. For Ex. if patch_lengths is [5,4] and mix_ratio is [1,1], | ||
then equal weights to both patch lengths. Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for clarity
forecast_mask_ratios (`List`, *optional*, defaults to `[1, 1]`): | |
List of weights to use for each patch length. For Ex. if patch_lengths is [5,4] and mix_ratio is [1,1], | |
then equal weights to both patch lengths. Defaults to None. | |
forecast_mask_weight_ratios (`List`, *optional*, defaults to `[1, 1]`): | |
List of weights to use for each patch length. For Ex. if patch_lengths is [5,4] and mix_ratio is [1,1], | |
then equal weights to both patch lengths. Defaults to None. |
Hi @amyeroberts, thank you for your details comments and requests. We have addressed all your concerns including adding examples on the use of these model in the modeling page. Can you please kindly review? CC @kashif @vijaye12 . |
@namctin As this PR was merged it's closed and can't be reopened. Could you open a new PR for the review? |
…ingface#27486)" This reverts commit 78f6ed6.
…ingface#27486)" This reverts commit 78f6ed6.
* Initial commit of PatchTST model classes Co-authored-by: Phanwadee Sinthong <[email protected]> Co-authored-by: Nam Nguyen <[email protected]> Co-authored-by: Vijay Ekambaram <[email protected]> Co-authored-by: Ngoc Diep Do <[email protected]> Co-authored-by: Wesley Gifford <[email protected]> * Add PatchTSTForPretraining * update to include classification Co-authored-by: Phanwadee Sinthong <[email protected]> Co-authored-by: Nam Nguyen <[email protected]> Co-authored-by: Vijay Ekambaram <[email protected]> Co-authored-by: Ngoc Diep Do <[email protected]> Co-authored-by: Wesley Gifford <[email protected]> * clean up auto files * Add PatchTSTForPrediction * Fix relative import * Replace original PatchTSTEncoder with ChannelAttentionPatchTSTEncoder * temporary adding absolute path + add PatchTSTForForecasting class * Update base PatchTSTModel + Unittest * Update ForecastHead to use the config class * edit cv_random_masking, add mask to model output * Update configuration_patchtst.py * add masked_loss to the pretraining * add PatchEmbeddings * Update configuration_patchtst.py * edit loss which considers mask in the pretraining * remove patch_last option * Add commits from internal repo * Update ForecastHead * Add model weight initilization + unittest * Update PatchTST unittest to use local import * PatchTST integration tests for pretraining and prediction * Added PatchTSTForRegression + update unittest to include label generation * Revert unrelated model test file * Combine similar output classes * update PredictionHead * Update configuration_patchtst.py * Add Revin * small edit to PatchTSTModelOutputWithNoAttention * Update modeling_patchtst.py * Updating integration test for forecasting * Fix unittest after class structure changed * docstring updates * change input_size to num_input_channels * more formatting * Remove some unused params * Add a comment for pretrained models * add channel_attention option add channel_attention option and remove unused positional encoders. * Update PatchTST models to use HF's MultiHeadAttention module * Update paper + github urls * Fix hidden_state return value * Update integration test to use PatchTSTForForecasting * Adding dataclass decorator for model output classes * Run fixup script * Rename model repos for integration test * edit argument explanation * change individual option to shared_projection * style * Rename integration test + import cleanup * Fix outpu_hidden_states return value * removed unused mode * added std, mean and nops scaler * add initial distributional loss for predition * fix typo in docs * add generate function * formatting * add num_parallel_samples * Fix a typo * copy weighted_average function, edit PredictionHead * edit PredictionHead * add distribution head to forecasting * formatting * Add generate function for forecasting * Add generate function to prediction task * formatting * use argsort * add past_observed_mask ordering * fix arguments * docs * add back test_model_outputs_equivalence test * formatting * cleanup * formatting * use ACT2CLS * formatting * fix add_start_docstrings decorator * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * fix typos * add forecast_masking * fixed tests * use set_seed * fix doc test * formatting * Update docs/source/en/model_doc/patchtst.md Co-authored-by: NielsRogge <[email protected]> * better var names * rename PatchTSTTranspose * fix argument names and docs string * remove compute_num_patches and unused class * remove assert * renamed to PatchTSTMasking * use num_labels for classification * use num_labels * use default num_labels from super class * move model_type after docstring * renamed PatchTSTForMaskPretraining * bs -> batch_size * more review fixes * use hidden_state * rename encoder layer and block class * remove commented seed_number * edit docstring * Add docstring * formatting * use past_observed_mask * doc suggestion * make fix-copies * use Args: * add docstring * add docstring * change some variable names and add PatchTST before some class names * formatting * fix argument types * fix tests * change x variable to patch_input * format * formatting * fix-copies * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * move loss to forward * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * formatting * fix a bug when pre_norm is set to True * output_hidden_states is set to False as default * set pre_norm=True as default * format docstring * format * output_hidden_states is None by default * add missing docs * better var names * docstring: remove default to False in output_hidden_states * change labels name to target_values in regression task * format * fix tests * change to forecast_mask_ratios and random_mask_ratio * change mask names * change future_values to target_values param in the prediction class * remove nn.Sequential and make PatchTSTBatchNorm class * black * fix argument name for prediction * add output_attentions option * add output_attentions to PatchTSTEncoder * formatting * Add attention output option to all classes * Remove PatchTSTEncoderBlock * create PatchTSTEmbedding class * use config in PatchTSTPatchify * Use config in PatchTSTMasking class * add channel_attn_weights * Add PatchTSTScaler class * add output_attentions arg to test function * format * Update doc with image patchtst.md * fix-copies * rename Forecast <-> Prediction * change name of a few parameters to match with PatchTSMixer. * Remove *ForForecasting class to match with other time series models. * make style * Remove PatchTSTForForecasting in the test * remove PatchTSTForForecastingOutput class * change test_forecast_head to test_prediction_head * style * fix docs * fix tests * change num_labels to num_targets * Remove PatchTSTTranspose * remove arguments in PatchTSTMeanScaler * remove arguments in PatchTSTStdScaler * add config as an argument to all the scaler classes * reformat * Add norm_eps for batchnorm and layernorm * reformat. * reformat * edit docstring * update docstring * change variable name pooling to pooling_type * fix output_hidden_states as tuple * fix bug when calling PatchTSTBatchNorm * change stride to patch_stride * create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder * formatting * initialize scalers with configs * edit output_hidden_states * style * fix forecast_mask_patches doc string --------- Co-authored-by: Gift Sinthong <[email protected]> Co-authored-by: Nam Nguyen <[email protected]> Co-authored-by: Vijay Ekambaram <[email protected]> Co-authored-by: Ngoc Diep Do <[email protected]> Co-authored-by: Wesley Gifford <[email protected]> Co-authored-by: Wesley M. Gifford <[email protected]> Co-authored-by: nnguyen <[email protected]> Co-authored-by: Ngoc Diep Do <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>
* add distribution head to forecasting * formatting * Add generate function for forecasting * Add generate function to prediction task * formatting * use argsort * add past_observed_mask ordering * fix arguments * docs * add back test_model_outputs_equivalence test * formatting * cleanup * formatting * use ACT2CLS * formatting * fix add_start_docstrings decorator * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * fix typos * add forecast_masking * fixed tests * use set_seed * fix doc test * formatting * Update docs/source/en/model_doc/patchtst.md Co-authored-by: NielsRogge <[email protected]> * better var names * rename PatchTSTTranspose * fix argument names and docs string * remove compute_num_patches and unused class * remove assert * renamed to PatchTSTMasking * use num_labels for classification * use num_labels * use default num_labels from super class * move model_type after docstring * renamed PatchTSTForMaskPretraining * bs -> batch_size * more review fixes * use hidden_state * rename encoder layer and block class * remove commented seed_number * edit docstring * Add docstring * formatting * use past_observed_mask * doc suggestion * make fix-copies * use Args: * add docstring * add docstring * change some variable names and add PatchTST before some class names * formatting * fix argument types * fix tests * change x variable to patch_input * format * formatting * fix-copies * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * move loss to forward * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <[email protected]> * formatting * fix a bug when pre_norm is set to True * output_hidden_states is set to False as default * set pre_norm=True as default * format docstring * format * output_hidden_states is None by default * add missing docs * better var names * docstring: remove default to False in output_hidden_states * change labels name to target_values in regression task * format * fix tests * change to forecast_mask_ratios and random_mask_ratio * change mask names * change future_values to target_values param in the prediction class * remove nn.Sequential and make PatchTSTBatchNorm class * black * fix argument name for prediction * add output_attentions option * add output_attentions to PatchTSTEncoder * formatting * Add attention output option to all classes * Remove PatchTSTEncoderBlock * create PatchTSTEmbedding class * use config in PatchTSTPatchify * Use config in PatchTSTMasking class * add channel_attn_weights * Add PatchTSTScaler class * add output_attentions arg to test function * format * Update doc with image patchtst.md * fix-copies * rename Forecast <-> Prediction * change name of a few parameters to match with PatchTSMixer. * Remove *ForForecasting class to match with other time series models. * make style * Remove PatchTSTForForecasting in the test * remove PatchTSTForForecastingOutput class * change test_forecast_head to test_prediction_head * style * fix docs * fix tests * change num_labels to num_targets * Remove PatchTSTTranspose * remove arguments in PatchTSTMeanScaler * remove arguments in PatchTSTStdScaler * add config as an argument to all the scaler classes * reformat * Add norm_eps for batchnorm and layernorm * reformat. * reformat * edit docstring * update docstring * change variable name pooling to pooling_type * fix output_hidden_states as tuple * fix bug when calling PatchTSTBatchNorm * change stride to patch_stride * create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder * formatting * initialize scalers with configs * edit output_hidden_states * style * fix forecast_mask_patches doc string * doc improvements * move summary to the start * typo * fix docstring * turn off masking when using prediction, regression, classification * return scaled output * adjust output when using distribution head * remove _num_patches function in the config * get config.num_patches from patchifier init * add output_attentions docstring, remove tuple in output_hidden_states * change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput * remove print("model_class: ", model_class) * change encoder_attention_heads to num_attention_heads * change norm to norm_layer * change encoder_layers to num_hidden_layers * change shared_embedding to share_embedding, shared_projection to share_projection * add output_attentions * more robust check of norm_type * change dropout_path to path_dropout * edit docstring * remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding * edit shape of cls_token and initialize it * add a check on the num_input_channels. * edit head_dim in the Prediction class to allow the use of cls_token * remove some positional_encoding_type options, remove learn_pe arg, initalize pe * change Exception to ValueError * format * norm_type is "batchnorm" * make style * change cls_token shape * Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios. * Bring PatchTSTClassificationHead on top of PatchTSTForClassification * change encoder_ffn_dim to ffn_dim and edit the docstring. * update variable names to match with the config * add generation tests * change num_mask_patches to num_forecast_mask_patches * Add examples explaining the use of these models * make style * Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)" This reverts commit 78f6ed6. * make style * fix default std scaler's minimum_scale * fix docstring * close code blocks * Update docs/source/en/model_doc/patchtst.md Co-authored-by: amyeroberts <[email protected]> * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/configuration_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <[email protected]> * fix tests * add add_start_docstrings * move examples to the forward's docstrings * update prepare_batch * update test * fix test_prediction_head * fix generation test * use seed to create generator * add output_hidden_states and config.num_patches * add loc and scale args in PatchTSTForPredictionOutput * edit outputs if if not return_dict * use self.share_embedding to check instead checking type. * remove seed * make style * seed is an optional int * fix test * generator device * Fix assertTrue test * swap order of items in outputs when return_dict=False. * add mask_type and random_mask_ratio to unittest * Update modeling_patchtst.py * add add_start_docstrings for regression model * make style * update model path * Edit the ValueError comment in forecast_masking * update examples * make style * fix commented code * update examples: remove config from from_pretrained call * Edit example outputs * Set default target_values to None * remove config setting in regression example * Update configuration_patchtst.py * Update configuration_patchtst.py * remove config from examples * change default d_model and ffn_dim * norm_eps default * set has_attentions to Trye and define self.seq_length = self.num_patche * update docstring * change variable mask_input to do_mask_input * fix blank space. * change logger.debug to logger.warning. * remove unused PATCHTST_INPUTS_DOCSTRING * remove all_generative_model_classes * set test_missing_keys=True * remove undefined params in the docstring. --------- Co-authored-by: nnguyen <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Nam Nguyen <[email protected]> Co-authored-by: Wesley Gifford <[email protected]> Co-authored-by: amyeroberts <[email protected]>
What does this PR do?
Adding PatchTST model https://arxiv.org/abs/2211.14730
@kashif
To-Do's: