Implementation with PyTorch 1.3.0 for multi-gpu DARTS https://arxiv.org/abs/1806.09055
🚩 Note that this is not the official code, please refer https://github.com/quark0/darts for more details.
This code is also based on https://github.com/alphadl/darts.pytorch1.1, yet with some bug fixed, so that it can perform better on multi-gpu.
Code accompanying the paper
The algorithm is based on continuous relaxation and gradient descent in the architecture space. It is able to efficiently design high-performance convolutional architectures for image classification (on CIFAR-10 and ImageNet) and recurrent architectures for language modeling (on Penn Treebank and WikiText-2). Only a single GPU is required.DARTS: Differentiable Architecture Search
Hanxiao Liu, Karen Simonyan, Yiming Yang.
arXiv:1806.09055.
Python >= 3.5.5, PyTorch == 1.3.0
NOTE: The latest PyTorch 1.3.0 is supported at this version.
Instructions for acquiring PTB and WT2 can be found here. While CIFAR-10 can be automatically downloaded by torchvision.
The easist way to get started is to evaluate our pretrained DARTS models.
CIFAR-10 (cifar10_model.pt)
cd cnn && python test.py --auxiliary --model_path cifar10_model.pt
- Expected result: 2.63% test error rate with 3.3M model params.
PTB (ptb_model.pt)
cd rnn && python test.py --model_path ptb_model.pt
- Expected result: 55.68 test perplexity with 23M model params.
To carry out architecture search using 2nd-order approximation, run
cd cnn && python train_search.py --gpu 0,1 # for conv cells on CIFAR-10
cd rnn && python train_search.py --gpu 0,1 # for recurrent cells on PTB
Note the validation performance in this step does not indicate the final performance of the architecture. One must train the obtained genotype/architecture from scratch using full-sized models, as described in the next section.
Also be aware that different runs would end up with different local minimum. To get the best result, it is crucial to repeat the search process with different seeds and select the best cell(s) based on validation performance (obtained by training the derived cell from scratch for a small number of epochs). Please refer to fig. 3 and sect. 3.2 in our arXiv paper.
Figure: Snapshots of the most likely normal conv, reduction conv, and recurrent cells over time.
To evaluate our best cells by training from scratch, run
cd cnn && python train.py --auxiliary --cutout # CIFAR-10
cd rnn && python train.py # PTB
Customized architectures are supported through the --arch
flag once specified in genotypes.py
.
The CIFAR-10 result at the end of training is subject to variance due to the non-determinism of cuDNN back-prop kernels. It would be misleading to report the result of only a single run. By training our best cell from scratch, one should expect the average test error of 10 independent runs to fall in the range of 2.76 +/- 0.09% with high probability.
Figure: Expected learning curves on CIFAR-10 (4 runs), ImageNet and PTB.
Package graphviz is required to visualize the learned cells
python visualize.py DARTS
where DARTS
can be replaced by any customized architectures in genotypes.py
.
If you use any part of this code in your research, please cite our paper:
@article{liu2018darts,
title={DARTS: Differentiable Architecture Search},
author={Liu, Hanxiao and Simonyan, Karen and Yang, Yiming},
journal={arXiv preprint arXiv:1806.09055},
year={2018}
}