Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not report evaluation metrics from a single run #30

Open
yrahul3910 opened this issue Apr 16, 2021 · 0 comments
Open

Do not report evaluation metrics from a single run #30

yrahul3910 opened this issue Apr 16, 2021 · 0 comments

Comments

@yrahul3910
Copy link

Checklist Item Proposal

A crucial component of deep neural network performance is initialization, which has been widely studied [1][2][3]. These initialization methods, however, fundamentally rely on random sampling from a distribution (e.g., by scaling the variance as in [2]). In practice, this randomness translates to a variance in performance. Therefore, it is a flaw to report a result (e.g., robust accuracy) from one single run. Rather, it is encouraged to report a median and standard deviation from, say, 10 runs (or whatever is feasible under the constraints of compute availability). These results can then be compared using a statistical test such as a Scott-Knott test. The current recommendation to make pre-trained models open source is a step in this direction since it allows researchers with access to more compute resources to run more rigorous evaluations of previously proposed methods.

References

[1] Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256). JMLR Workshop and Conference Proceedings.
[2] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026-1034).
[3] Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013, May). On the importance of initialization and momentum in deep learning. In International conference on machine learning (pp. 1139-1147). PMLR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant