Skip to content
forked from uclaml/Padam

Partially Adaptive Momentum Estimation method in the paper "Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks" (accepted by IJCAI 2020)

Notifications You must be signed in to change notification settings

nicolalandro/Padam

 
 

Repository files navigation

Padam vs ATMO

This code is a fork of Padam offical code to obtain a perfect comparison between ATMO idea and Padam.

Prerequisites:

pip install -r requirements.txt

Usage:

Use python to run run_cnn_test_cifar10.py for experiments on Cifar10 and run_cnn_test_cifar100.py for experiments on Cifar100

Command Line Arguments:

  • --lr: (start) learning rate
  • --method: optimization method, e.g., "sgdm", "adam", "amsgrad", "padam", "mps", "mas", "map"
  • --net: network architecture, e.g. "vggnet", "resnet", "wideresnet"
  • --partial: partially adaptive parameter for Padam method
  • --wd: weight decay
  • --Nepoch: number of training epochs
  • --resume: whether resume from previous training process

Usage Examples:

  • Run experiments on Cifar10:
python run_cnn_test_cifar10.py  --lr 0.01 --method "mps" --net "resnet"  --partial 0.125 --wd 2.5e-2 > logs/resnet/file.log
  • Obtain max and mean of logs
python folder_mean_accuracy.py

Results

SGD-Momentum ADAM Amsgrad AdamW Yogi AdaBound Padam Dynamic ATMO
95.00 92.89 93.53 94.56 93.92 94.16 94.94 95.27

Citation

Please check our paper for technical details and full results.

@article{
  title={Combining Optimization Methods Using an Adaptive Meta Optimizer},
  author={Nicola Landro and Ignazio Gallo and Riccardo La Grassa},
  year={2021},
  journal={Algorithms MDPI},
}

About

Partially Adaptive Momentum Estimation method in the paper "Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks" (accepted by IJCAI 2020)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.4%
  • Shell 2.6%