Sweep code for studying model population stats (2 of 2) (#144)

Summary: This is a *major update* and introduces powerful new functionality to pycls. The pycls codebase now provides powerful support for studying *design spaces* and more generally *population statistics* of models as introduced in [On Network Design Spaces for Visual Recognition](https://arxiv.org/abs/1905.13214) and [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678). This idea is that instead of planning a single pycls job (e.g., testing a specific model configuration), one can study the behavior of an entire population of models. This allows for quite powerful and succinct experimental design, and elevates the study of individual model behavior to the study of the behavior of model populations. Please see [`SWEEP_INFO`](docs/SWEEP_INFO.md) for details. This is commit 2 of 2 for the sweep code. It is focused on sweep analysis, sweep examples, and documentation. Pull Request resolved: #144 Reviewed By: rajprateek Differential Revision: D28586390 Pulled By: pdollar fbshipit-source-id: 55856f9aaf7ae49243f4870c787a144b03e5d2a9 Co-authored-by: Raj Prateek Kosaraju <[email protected]> Co-authored-by: Piotr Dollar <[email protected]>
facebookresearch · May 20, 2021 · 2d71381 · 2d71381
1 parent bd65938
commit 2d71381
Show file tree

Hide file tree

Showing 12 changed files with 1,046 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -9,9 +9,7 @@
 
 ## Introduction
 
-The goal of **pycls** is to provide a simple and flexible codebase for image classification. It is designed to support rapid implementation and evaluation of research ideas. **pycls** also provides a large collection of baseline results ([Model Zoo](MODEL_ZOO.md)).
-
-The codebase supports efficient single-machine multi-gpu training, powered by the PyTorch distributed package, and provides implementations of standard models including [ResNet](https://arxiv.org/abs/1512.03385), [ResNeXt](https://arxiv.org/abs/1611.05431), [EfficientNet](https://arxiv.org/abs/1905.11946), and [RegNet](https://arxiv.org/abs/2003.13678).
+The goal of **pycls** is to provide a simple and flexible codebase for image classification. It is designed to support rapid implementation and evaluation of research ideas. **pycls** also provides a large collection of baseline results ([Model Zoo](MODEL_ZOO.md)).  The codebase supports efficient single-machine multi-gpu training, powered by the PyTorch distributed package, and provides implementations of standard models including [ResNet](https://arxiv.org/abs/1512.03385), [ResNeXt](https://arxiv.org/abs/1611.05431), [EfficientNet](https://arxiv.org/abs/1905.11946), and [RegNet](https://arxiv.org/abs/2003.13678).
 
 ## Using pycls
 
@@ -21,13 +19,18 @@ Please see [`GETTING_STARTED`](docs/GETTING_STARTED.md) for brief installation i
 
 We provide a large set of baseline results and pretrained models available for download in the **pycls** [Model Zoo](MODEL_ZOO.md); including the simple, fast, and effective [RegNet](https://arxiv.org/abs/2003.13678) models that we hope can serve as solid baselines across a wide range of flop regimes.
 
+## Sweep Code
+
+The pycls codebase now provides powerful support for studying *design spaces* and more generally *population statistics* of models as introduced in [On Network Design Spaces for Visual Recognition](https://arxiv.org/abs/1905.13214) and [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678). This idea is that instead of planning a single pycls job (e.g., testing a specific model configuration), one can study the behavior of an entire population of models. This allows for quite powerful and succinct experimental design, and elevates the study of individual model behavior to the study of the behavior of model populations. Please see [`SWEEP_INFO`](docs/SWEEP_INFO.md) for details.
+
 ## Projects
 
 A number of projects at FAIR have been built on top of **pycls**:
 
 - [On Network Design Spaces for Visual Recognition](https://arxiv.org/abs/1905.13214)
 - [Exploring Randomly Wired Neural Networks for Image Recognition](https://arxiv.org/abs/1904.01569)
 - [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678)
+- [Fast and Accurate Model Scaling](https://arxiv.org/abs/2103.06877)
 - [Are Labels Necessary for Neural Architecture Search?](https://arxiv.org/abs/2003.12056)
 - [PySlowFast Video Understanding Codebase](https://github.com/facebookresearch/SlowFast)
 
@@ -40,22 +43,29 @@ If you find **pycls** helpful in your research or refer to the baseline results
 ```
 @InProceedings{Radosavovic2019,
   title = {On Network Design Spaces for Visual Recognition},
-  author = {Radosavovic, Ilija and Johnson, Justin and Xie, Saining and Lo, Wan-Yen and Doll{\'a}r, Piotr},
+  author = {Ilija Radosavovic and Justin Johnson and Saining Xie Wan-Yen Lo and Piotr Doll{\'a}r},
   booktitle = {ICCV},
   year = {2019}
 }
 
 @InProceedings{Radosavovic2020,
   title = {Designing Network Design Spaces},
-  author = {Radosavovic, Ilija and Kosaraju, Raj Prateek and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
+  author = {Ilija Radosavovic and Raj Prateek Kosaraju and Ross Girshick and Kaiming He and Piotr Doll{\'a}r},
   booktitle = {CVPR},
   year = {2020}
 }
+
+@InProceedings{Dollar2021,
+  title = {Fast and Accurate Model Scaling},
+  author = {Piotr Doll{\'a}r and Mannat Singh and Ross Girshick},
+  booktitle = {CVPR},
+  year = {2021}
+}
 ```
 
 ## License
 
-**pycls** is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.
+**pycls** is released under the MIT license. Please see the [`LICENSE`](LICENSE) file for more information.
 
 ## Contributing
 

diff --git a/configs/sweeps/cifar/cifar_best.yaml b/configs/sweeps/cifar/cifar_best.yaml
@@ -0,0 +1,87 @@
+DESC:
+  Example CIFAR sweep 3 of 3 (trains the best model from cifar_regnet sweep).
+  Train the best RegNet-125M from cifar_regnet sweep for variable epoch lengths.
+  Trains 3 copies of every model (to obtain mean and std of the error).
+  The purpose of this sweep is to show how to train FINAL version of a model.
+NAME: cifar/cifar_best
+SETUP:
+  # Number of configs to sample
+  NUM_CONFIGS: 12
+  # SAMPLERS for optimization parameters
+  SAMPLERS:
+    OPTIM.MAX_EPOCH:
+      TYPE: value_sampler
+      VALUES: [50, 100, 200, 400]
+    RNG_SEED:
+      TYPE: int_sampler
+      RAND_TYPE: uniform
+      RANGE: [1, 3]
+      QUANTIZE: 1
+  CONSTRAINTS:
+    REGNET:
+      NUM_STAGES: [2, 2]
+  # BASE_CFG is RegNet-125MF (best model from cifar_regnet sweep)
+  BASE_CFG:
+    MODEL:
+      TYPE: regnet
+      NUM_CLASSES: 10
+    REGNET:
+      STEM_TYPE: res_stem_cifar
+      SE_ON: True
+      STEM_W: 16
+      DEPTH: 12
+      W0: 96
+      WA: 19.5
+      WM: 2.942
+      GROUP_W: 8
+    OPTIM:
+      BASE_LR: 1.0
+      LR_POLICY: cos
+      MAX_EPOCH: 50
+      MOMENTUM: 0.9
+      NESTEROV: True
+      WARMUP_EPOCHS: 5
+      WEIGHT_DECAY: 0.0005
+      EMA_ALPHA: 0.00025
+      EMA_UPDATE_PERIOD: 32
+    BN:
+      USE_CUSTOM_WEIGHT_DECAY: True
+    TRAIN:
+      DATASET: cifar10
+      SPLIT: train
+      BATCH_SIZE: 1024
+      IM_SIZE: 32
+      MIXED_PRECISION: True
+      LABEL_SMOOTHING: 0.1
+      MIXUP_ALPHA: 0.5
+    TEST:
+      DATASET: cifar10
+      SPLIT: test
+      BATCH_SIZE: 1000
+      IM_SIZE: 32
+    NUM_GPUS: 1
+    DATA_LOADER:
+      NUM_WORKERS: 4
+    LOG_PERIOD: 25
+    VERBOSE: False
+# Launch config options
+LAUNCH:
+  PARTITION: devlab
+  NUM_GPUS: 1
+  PARALLEL_JOBS: 12
+  TIME_LIMIT: 180
+# Analyze config options
+ANALYZE:
+  PLOT_METRIC_VALUES: False
+  PLOT_COMPLEXITY_VALUES: False
+  PLOT_CURVES_BEST: 3
+  PLOT_CURVES_WORST: 0
+  PLOT_MODELS_BEST: 1
+  METRICS: []
+  COMPLEXITY: [flops, params, acts, memory, epoch_fw_bw, epoch_time]
+  PRE_FILTERS: {done: [0, 1, 1]}
+  SPLIT_FILTERS:
+    epochs=050: {cfg.OPTIM.MAX_EPOCH: [ 50,  50,  50]}
+    epochs=100: {cfg.OPTIM.MAX_EPOCH: [100, 100, 100]}
+    epochs=200: {cfg.OPTIM.MAX_EPOCH: [200, 200, 200]}
+    epochs=400: {cfg.OPTIM.MAX_EPOCH: [400, 400, 400]}
diff --git a/configs/sweeps/cifar/cifar_optim.yaml b/configs/sweeps/cifar/cifar_optim.yaml
@@ -0,0 +1,76 @@
+DESC:
+  Example CIFAR sweep 1 of 3 (find lr and wd for cifar_regnet and cifar_best sweeps).
+  Tunes the learning rate (lr) and weight decay (wd) for ResNet-56 at 50 epochs.
+  The purpose of this sweep is to show how to optimize OPTIM parameters.
+NAME: cifar/cifar_optim
+SETUP:
+  # Number of configs to sample
+  NUM_CONFIGS: 64
+  # SAMPLERS for optimization parameters
+  SAMPLERS:
+    OPTIM.BASE_LR:
+      TYPE: float_sampler
+      RAND_TYPE: log_uniform
+      RANGE: [0.25, 5.0]
+      QUANTIZE: 1.0e-10
+    OPTIM.WEIGHT_DECAY:
+      TYPE: float_sampler
+      RAND_TYPE: log_uniform
+      RANGE: [5.0e-5, 1.0e-3]
+      QUANTIZE: 1.0e-10
+  # BASE_CFG is R-56 with large batch size and stronger augmentation
+  BASE_CFG:
+    MODEL:
+      TYPE: anynet
+      NUM_CLASSES: 10
+    ANYNET:
+      STEM_TYPE: res_stem_cifar
+      STEM_W: 16
+      BLOCK_TYPE: res_basic_block
+      DEPTHS: [9, 9, 9]
+      WIDTHS: [16, 32, 64]
+      STRIDES: [1, 2, 2]
+    OPTIM:
+      BASE_LR: 1.0
+      LR_POLICY: cos
+      MAX_EPOCH: 50
+      MOMENTUM: 0.9
+      NESTEROV: True
+      WARMUP_EPOCHS: 5
+      WEIGHT_DECAY: 0.0005
+      EMA_ALPHA: 0.00025
+      EMA_UPDATE_PERIOD: 32
+    BN:
+      USE_CUSTOM_WEIGHT_DECAY: True
+    TRAIN:
+      DATASET: cifar10
+      SPLIT: train
+      BATCH_SIZE: 1024
+      IM_SIZE: 32
+      MIXED_PRECISION: True
+      LABEL_SMOOTHING: 0.1
+      MIXUP_ALPHA: 0.5
+    TEST:
+      DATASET: cifar10
+      SPLIT: test
+      BATCH_SIZE: 1000
+      IM_SIZE: 32
+    NUM_GPUS: 1
+    DATA_LOADER:
+      NUM_WORKERS: 4
+    LOG_PERIOD: 25
+    VERBOSE: False
+# Launch config options
+LAUNCH:
+  PARTITION: devlab
+  NUM_GPUS: 1
+  PARALLEL_JOBS: 32
+  TIME_LIMIT: 60
+# Analyze config options
+ANALYZE:
+  PLOT_CURVES_BEST: 3
+  PLOT_METRIC_VALUES: True
+  PLOT_COMPLEXITY_VALUES: True
+  METRICS: [lr, wd, lr_wd]
+  COMPLEXITY: [flops, params, acts, memory, epoch_fw_bw, epoch_time]
+  PRE_FILTERS: {done: [1, 1, 1]}
diff --git a/configs/sweeps/cifar/cifar_regnet.yaml b/configs/sweeps/cifar/cifar_regnet.yaml
@@ -0,0 +1,78 @@
+DESC:
+  Example CIFAR sweep 2 of 3 (uses lr and wd found by cifar_optim sweep).
+  This sweep searches for a good RegNet-125MF model on cifar (same flops as R56).
+  The purpose of this sweep is to show how to optimize REGNET parameters.
+NAME: cifar/cifar_regnet
+SETUP:
+  # Number of configs to sample
+  NUM_CONFIGS: 32
+  # SAMPLER for RegNet
+  SAMPLERS:
+    REGNET:
+      TYPE: regnet_sampler
+      DEPTH: [6, 16]
+      GROUP_W: [1, 32]
+  # CONSTRAINTS for complexity (roughly based on R-56)
+  CONSTRAINTS:
+    CX:
+      FLOPS: [0.12e+9, 0.13e+9]
+      PARAMS: [0, 2.0e+6]
+      ACTS: [0, 1.0e+6]
+    REGNET:
+      NUM_STAGES: [2, 2]
+  # BASE_CFG is R-56 with large batch size and stronger augmentation
+  BASE_CFG:
+    MODEL:
+      TYPE: regnet
+      NUM_CLASSES: 10
+    REGNET:
+      STEM_TYPE: res_stem_cifar
+      SE_ON: True
+      STEM_W: 16
+    OPTIM:
+      BASE_LR: 1.0
+      LR_POLICY: cos
+      MAX_EPOCH: 50
+      MOMENTUM: 0.9
+      NESTEROV: True
+      WARMUP_EPOCHS: 5
+      WEIGHT_DECAY: 0.0005
+      EMA_ALPHA: 0.00025
+      EMA_UPDATE_PERIOD: 32
+    BN:
+      USE_CUSTOM_WEIGHT_DECAY: True
+    TRAIN:
+      DATASET: cifar10
+      SPLIT: train
+      BATCH_SIZE: 1024
+      IM_SIZE: 32
+      MIXED_PRECISION: True
+      LABEL_SMOOTHING: 0.1
+      MIXUP_ALPHA: 0.5
+    TEST:
+      DATASET: cifar10
+      SPLIT: test
+      BATCH_SIZE: 1000
+      IM_SIZE: 32
+    NUM_GPUS: 1
+    DATA_LOADER:
+      NUM_WORKERS: 4
+    LOG_PERIOD: 25
+    VERBOSE: False
+# Launch config options
+LAUNCH:
+  PARTITION: devlab
+  NUM_GPUS: 1
+  PARALLEL_JOBS: 32
+  TIME_LIMIT: 60
+# Analyze config options
+ANALYZE:
+  PLOT_METRIC_VALUES: True
+  PLOT_COMPLEXITY_VALUES: True
+  PLOT_CURVES_BEST: 3
+  PLOT_CURVES_WORST: 0
+  PLOT_MODELS_BEST: 8
+  PLOT_MODELS_WORST: 0
+  METRICS: [regnet_depth, regnet_w0, regnet_wa, regnet_wm, regnet_gw]
+  COMPLEXITY: [flops, params, acts, memory, epoch_fw_bw, epoch_time]
+  PRE_FILTERS: {done: [0, 1, 1]}
diff --git a/docs/DATA.md b/docs/DATA.md
@@ -36,14 +36,14 @@ Create a directory containing symlinks:
 mkdir -p /path/pycls/pycls/datasets/data
 ```
 
-Symlink ImageNet:
+Symlink ImageNet (`/datasets01/imagenet_full_size/061417/` on FAIR cluster):
 
 ```
-ln -s /path/imagenet /path/pycls/pycls/datasets/data/imagenet
+ln -sv /path/imagenet /path/pycls/pycls/datasets/data/imagenet
 ```
 
-Symlink CIFAR-10:
+Symlink CIFAR-10 (`/datasets01/cifar-10-batches-py/060817/` on FAIR cluster):
 
 ```
-ln -s /path/cifar10 /path/pycls/pycls/datasets/data/cifar10
+ln -sv /path/cifar10 /path/pycls/pycls/datasets/data/cifar10
 ```
diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md
@@ -97,7 +97,7 @@ python tools/time_net.py
     PREC_TIME.NUM_ITER 50
 ```
 
-### MODEL SCALING
+### Model Scaling
 
 Scale a RegNetY-4GF by 4x using fast compound scaling (see https://arxiv.org/abs/2103.06877):