This code is a fork of Padam offical code to obtain a perfect comparison between ATMO idea and Padam.
pip install -r requirements.txt
Use python to run run_cnn_test_cifar10.py for experiments on Cifar10 and run_cnn_test_cifar100.py for experiments on Cifar100
- --lr: (start) learning rate
- --method: optimization method, e.g., "sgdm", "adam", "amsgrad", "padam", "mps", "mas", "map"
- --net: network architecture, e.g. "vggnet", "resnet", "wideresnet"
- --partial: partially adaptive parameter for Padam method
- --wd: weight decay
- --Nepoch: number of training epochs
- --resume: whether resume from previous training process
- Run experiments on Cifar10:
python run_cnn_test_cifar10.py --lr 0.01 --method "mps" --net "resnet" --partial 0.125 --wd 2.5e-2 > logs/resnet/file.log
- Obtain max and mean of logs
python folder_mean_accuracy.py
SGD-Momentum | ADAM | Amsgrad | AdamW | Yogi | AdaBound | Padam | Dynamic ATMO |
---|---|---|---|---|---|---|---|
95.00 | 92.89 | 93.53 | 94.56 | 93.92 | 94.16 | 94.94 | 95.27 |
Please check our paper for technical details and full results.
@article{
title={Combining Optimization Methods Using an Adaptive Meta Optimizer},
author={Nicola Landro and Ignazio Gallo and Riccardo La Grassa},
year={2021},
journal={Algorithms MDPI},
}