You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A crucial component of deep neural network performance is initialization, which has been widely studied [1][2][3]. These initialization methods, however, fundamentally rely on random sampling from a distribution (e.g., by scaling the variance as in [2]). In practice, this randomness translates to a variance in performance. Therefore, it is a flaw to report a result (e.g., robust accuracy) from one single run. Rather, it is encouraged to report a median and standard deviation from, say, 10 runs (or whatever is feasible under the constraints of compute availability). These results can then be compared using a statistical test such as a Scott-Knott test. The current recommendation to make pre-trained models open source is a step in this direction since it allows researchers with access to more compute resources to run more rigorous evaluations of previously proposed methods.
References
[1] Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256). JMLR Workshop and Conference Proceedings.
[2] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026-1034).
[3] Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013, May). On the importance of initialization and momentum in deep learning. In International conference on machine learning (pp. 1139-1147). PMLR.
The text was updated successfully, but these errors were encountered:
Checklist Item Proposal
A crucial component of deep neural network performance is initialization, which has been widely studied [1][2][3]. These initialization methods, however, fundamentally rely on random sampling from a distribution (e.g., by scaling the variance as in [2]). In practice, this randomness translates to a variance in performance. Therefore, it is a flaw to report a result (e.g., robust accuracy) from one single run. Rather, it is encouraged to report a median and standard deviation from, say, 10 runs (or whatever is feasible under the constraints of compute availability). These results can then be compared using a statistical test such as a Scott-Knott test. The current recommendation to make pre-trained models open source is a step in this direction since it allows researchers with access to more compute resources to run more rigorous evaluations of previously proposed methods.
References
[1] Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256). JMLR Workshop and Conference Proceedings.
[2] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026-1034).
[3] Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013, May). On the importance of initialization and momentum in deep learning. In International conference on machine learning (pp. 1139-1147). PMLR.
The text was updated successfully, but these errors were encountered: