About the Electra paper #128

lgdgodv · 2021-09-04T07:19:16Z

On Page 13 of the paper, fine-tuning details part, the paper mentions that
"we searched for the best number of train epochs out of [10, 3] for each task. For SQuAD,
we decreased the number of train epochs to 2 to be consistent with BERT and RoBERTa"

My question is that did Electra use similar early stopping technique (set the training once for 10 epochs, early stop when validation loss stops decreasing) like in RoBERTa? Or you guys set the training for 3,4,5,6,7,8,9,10 epochs separately?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the Electra paper #128

About the Electra paper #128

lgdgodv commented Sep 4, 2021 •

edited

Loading

About the Electra paper #128

About the Electra paper #128

Comments

lgdgodv commented Sep 4, 2021 • edited Loading

lgdgodv commented Sep 4, 2021 •

edited

Loading