You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since it is a LMHeadModel, the 1^th-n^th tokens are used to predict the (n+1)^th token during training, so why not introduce attention_mask for masking the (n+2)^th-(n+m)^th tokens. Without attention_mask, there maybe an inconsistency between the training and the testing scene. Is it possible to add attention_mask during training to make the testing better?
The text was updated successfully, but these errors were encountered:
Why is the response concatenated to the input_ids for both the train and validation datasets? Would not this create over-fitted models? Would it be possible to somehow mask the response ids?
DialoGPT/LSP_train.py
Line 281 in fa0c0c5
Since it is a LMHeadModel, the
1^th
-n^th
tokens are used to predict the(n+1)^th
token during training, so why not introduce attention_mask for masking the(n+2)^th
-(n+m)^th
tokens. Without attention_mask, there maybe an inconsistency between the training and the testing scene. Is it possible to add attention_mask during training to make the testing better?The text was updated successfully, but these errors were encountered: