You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On Page 13 of the paper, fine-tuning details part, the paper mentions that
"we searched for the best number of train epochs out of [10, 3] for each task. For SQuAD,
we decreased the number of train epochs to 2 to be consistent with BERT and RoBERTa"
My question is that did Electra use similar early stopping technique (set the training once for 10 epochs, early stop when validation loss stops decreasing) like in RoBERTa? Or you guys set the training for 3,4,5,6,7,8,9,10 epochs separately?
Thanks.
The text was updated successfully, but these errors were encountered:
On Page 13 of the paper, fine-tuning details part, the paper mentions that
"we searched for the best number of train epochs out of [10, 3] for each task. For SQuAD,
we decreased the number of train epochs to 2 to be consistent with BERT and RoBERTa"
My question is that did Electra use similar early stopping technique (set the training once for 10 epochs, early stop when validation loss stops decreasing) like in RoBERTa? Or you guys set the training for 3,4,5,6,7,8,9,10 epochs separately?
Thanks.
The text was updated successfully, but these errors were encountered: