Below is a list of related work that have been written on or with the help of PyHessian.
-
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Summary
- AdaHessian is a new second order optimizer that uses Hessian diagonal to adaptively adjust gradient
- The key idea is a novel inexact Newton method with variance reduction (RMS in time along with spatial averaging)
- Experiments on CV, NLP, and recommendation systems, show better performance compared to other optimizers.
- This is one of the first instances that a second order method can exceed ADAM/SGD performance.
-
Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks