Why doesn't the model input include attention_mask? #58

li3cmz · 2020-12-25T02:02:12Z

Line 281 in fa0c0c5

loss, ppl = model(input_ids, position_ids, token_ids, label_ids)

Since it is a LMHeadModel, the 1^th-n^th tokens are used to predict the (n+1)^th token during training, so why not introduce attention_mask for masking the (n+2)^th-(n+m)^th tokens. Without attention_mask, there maybe an inconsistency between the training and the testing scene. Is it possible to add attention_mask during training to make the testing better?

The text was updated successfully, but these errors were encountered:

chujiezheng · 2021-03-09T06:14:46Z

Because GPT is a uni-directional language model. It does not need attention mask.

lmrojasb · 2021-04-26T08:48:28Z

DialoGPT/data_loader.py

Why is the response concatenated to the input_ids for both the train and validation datasets? Would not this create over-fitted models? Would it be possible to somehow mask the response ids?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why doesn't the model input include attention_mask? #58

Why doesn't the model input include attention_mask? #58

li3cmz commented Dec 25, 2020

chujiezheng commented Mar 9, 2021

lmrojasb commented Apr 26, 2021

Why doesn't the model input include attention_mask? #58

Why doesn't the model input include attention_mask? #58

Comments

li3cmz commented Dec 25, 2020

chujiezheng commented Mar 9, 2021

lmrojasb commented Apr 26, 2021