Do not report evaluation metrics from a single run #30

yrahul3910 · 2021-04-16T09:37:40Z

Checklist Item Proposal

A crucial component of deep neural network performance is initialization, which has been widely studied [1][2][3]. These initialization methods, however, fundamentally rely on random sampling from a distribution (e.g., by scaling the variance as in [2]). In practice, this randomness translates to a variance in performance. Therefore, it is a flaw to report a result (e.g., robust accuracy) from one single run. Rather, it is encouraged to report a median and standard deviation from, say, 10 runs (or whatever is feasible under the constraints of compute availability). These results can then be compared using a statistical test such as a Scott-Knott test. The current recommendation to make pre-trained models open source is a step in this direction since it allows researchers with access to more compute resources to run more rigorous evaluations of previously proposed methods.

References

[1] Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256). JMLR Workshop and Conference Proceedings.
[2] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026-1034).
[3] Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013, May). On the importance of initialization and momentum in deep learning. In International conference on machine learning (pp. 1139-1147). PMLR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not report evaluation metrics from a single run #30

Do not report evaluation metrics from a single run #30

yrahul3910 commented Apr 16, 2021

Do not report evaluation metrics from a single run #30

Do not report evaluation metrics from a single run #30

Comments

yrahul3910 commented Apr 16, 2021

Checklist Item Proposal

References