Codes for ICML 2021 paper Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms by Kaiyi Ji, Junjie Yang, and Yingbin Liang from The Ohio State University.
Our hyperparameter optimization implementation is bulit on HyperTorch, where we propose stoc-BiO algorithm with better performance than other bilevel algorithms. Our code is tested on python3 and PyTorch1.8.
Note that hypergrad package is built on HyperTorch.
The experiments based on 20 Newsgroup and MNIST datasets are in l2reg_on_twentynews.py and mnist_exp.py, respectively.
We introduce some basic args meanning as follows.
-
--alg
: Different algorithms we support. -
--hessian_q
: The number of Hessian vectors used to estimate. -
--training_size
: The number of samples used in training. -
--validation_size
: The number of samples used for validation. -
--batch_size
: Batch size for traning data. -
--epochs
: Outer epoch number for training. -
--iterations
or--T
: Inner iteration number for training. -
--eta
: Hyperparameter$\eta$ for Hessian inverse approximation. -
--noise_rate
: The corruption rate for MNIST data.
To replicate empirical results under different datasets in our paper, please run the following commands:
python3 mnist_exp.py --alg stocBiO --batch_size 50 --noise_rate 0.1
python3 mnist_exp.py --alg stocBiO --batch_size 50 --noise_rate 0.4
python3 l2reg_on_twentynews.py --alg stocBiO
python3 mnist_exp.py --alg AID-FP --batch_size 50 --noise_rate 0.4
python3 l2reg_on_twentynews.py --alg AID-FP
Our meta-learning part is built on learn2learn, where we implement the bilevel optimizer ITD-BiO and show that it converges faster than MAML and ANIL. Note that we also implement first-order ITD-BiO (FO-ITD-BiO) without computing the derivative of the inner-loop output with respect to feature parameters, i.e., removing all Jacobian and Hessian-vector calculations. It turns out that FO-ITD-BiO is even faster without sacrificing overall prediction accuracy.
For Windows OS,
- PyTorch=1.7.1
- l2l=0.1.5
- python=3.8
- cuda=11.3
For Linux OS,
- PyTorch=1.7.0
- l2l=0.1.5
- python=3.6.9
- cuda=10.2
For both OS, we highly suggest an old version of l2l. For latest versions of l2l, some adaptations of codes are needed.
In the following, we provide some experiments to demonstrate the better performance of the proposed stoc-BiO algorithm.
We compare our algorithm to various hyperparameter baseline algorithms on 20 Newsgroup dataset:
We evaluate the performance of our algorithm with respect to different batch sizes:
The comparison results on MNIST dataset:
This repo is still under construction and any comment is welcome!
If this repo is useful for your research, please cite our paper:
@inproceedings{ji2021bilevel,
author = {Ji, Kaiyi and Yang, Junjie and Liang, Yingbin},
title = {Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms},
booktitle={International Conference on Machine Learning (ICML)},
year = {2021}}