ML - Optimization

Paper	Conference	Remarks
Practical Recommendations for Gradient-Based Training of Deep Architectures	Arxiv 2012	A practical guide with recommendations for some of the most commonly used hyperparameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization.
Dropout: A Simple Way to Prevent Neural Networks from Overfitting	JMLR 2014	The key idea is to randomly drop units during training, which prevents units from co-adapting too much.
Adam: A Method for Stochastic Optimization	Arxiv 2014	An algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	Arxiv 2015	1. The distribution of each layer's inputs changes during training, which slows down the training by requiring lower learning rates and careful parameter initialization. 2. BN performs normalization for each training mini-batch and allows us to use much higher learning rate and be less careful about initialization
An Overview of Gradient Descent Optimization Algorithms	Arxiv 2016	1. Stochastic vs Batch vs Minibatch Gradient Descent. 2. It introduces momentum, Nesterov accelerated momentum, Adagrad, Adadelta, RMSProps and Adam optimizations.
Population-Based Training for Neural Networks	Arxiv 2017	1. Eval: measure validation accuracy of a model. 2. Exploit: copy parameters from other models in population. 3. Explore: perturb parameters. 4. Ready: a model becomes ready after a few epochs since last parameter update
On the Convergence of Adam and Beyond	ICLR 2018	1. Show that one cause for convergence failure of Adam is the exponential moving average used in the algorithms. 2. Suggests that the convergence issues can be fixed by endowing such algorithms with "long-term memory" of past gradients, and propose new variants of the Adam algorithm which not only fix the convergence issues but often also lead to improved empirical performance.

Resources

Neural Networks and Deep Learning

Back to index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML_optimization.md

ML_optimization.md

ML - Optimization

Resources

Files

ML_optimization.md

Latest commit

History

ML_optimization.md

File metadata and controls

ML - Optimization

Resources