MoCo v3 Reference Setups and Models

Here we document the reference commands for pre-training and evaluating various MoCo v3 models.

ResNet-50 models

With batch 4096, the training of all ResNet-50 models can fit into 2 nodes with a total of 16 Volta 32G GPUs.

ResNet-50, 100-epoch pre-training.

On the first node, run:

python main_moco.py \
  --moco-m-cos --crop-min=.2 \
  --dist-url 'tcp://[your first node address]:[specified port]' \
  --multiprocessing-distributed --world-size 2 --rank 0 \
  [your imagenet-folder with train and val folders]

On the second node, run the same command with --rank 1.

ResNet-50, 300-epoch pre-training.

On the first node, run:

python main_moco.py \
  --lr=.3 --epochs=300 \
  --moco-m-cos --crop-min=.2 \
  --dist-url 'tcp://[your first node address]:[specified port]' \
  --multiprocessing-distributed --world-size 2 --rank 0 \
  [your imagenet-folder with train and val folders]

On the second node, run the same command with --rank 1.

ResNet-50, 1000-epoch pre-training.

On the first node, run:

python main_moco.py \
  --lr=.3 --wd=1.5e-6 --epochs=1000 \
  --moco-m=0.996 --moco-m-cos --crop-min=.2 \
  --dist-url 'tcp://[your first node address]:[specified port]' \
  --multiprocessing-distributed --world-size 2 --rank 0 \
  [your imagenet-folder with train and val folders]

On the second node, run the same command with --rank 1.

ResNet-50, linear classification.

Run on single node:

python main_lincls.py \
  --dist-url 'tcp://localhost:10001' \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
  [your imagenet-folder with train and val folders]

Below are our pre-trained ResNet-50 models and logs.

pretrain epochs	linear acc	pretrain files	linear files
100	68.9	chpt	chpt / log
300	72.8	chpt	chpt / log
1000	74.6	chpt	chpt / log

ViT Models

All ViT models are pre-trained for 300 epochs with AdamW.

ViT-Small, 1-node (8-GPU), 1024-batch pre-training.

This setup fits into a single node of 8 Volta 32G GPUs, for ease of debugging.

python main_moco.py \
  -a vit_small -b 1024 \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=300 --warmup-epochs=40 \
  --stop-grad-conv1 --moco-m-cos --moco-t=.2 \
  --dist-url 'tcp://localhost:10001' \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

ViT-Small, 4-node (32-GPU) pre-training.

On the first node, run:

python main_moco.py \
  -a vit_small \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=300 --warmup-epochs=40 \
  --stop-grad-conv1 --moco-m-cos --moco-t=.2 \
  --dist-url 'tcp://[your first node address]:[specified port]' \
  --multiprocessing-distributed --world-size 8 --rank 0 \
  [your imagenet-folder with train and val folders]

On other nodes, run the same command with --rank 1, ..., --rank 3 respectively.

ViT-Small, linear classification.

Run on single node:

python main_lincls.py \
  -a vit_small --lr=3 \
  --dist-url 'tcp://localhost:10001' \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
  [your imagenet-folder with train and val folders]

ViT-Base, 8-node (64-GPU) pre-training.

python main_moco.py \
  -a vit_base \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=300 --warmup-epochs=40 \
  --stop-grad-conv1 --moco-m-cos --moco-t=.2 \
  --dist-url 'tcp://[your first node address]:[specified port]' \
  --multiprocessing-distributed --world-size 8 --rank 0 \
  [your imagenet-folder with train and val folders]

On other nodes, run the same command with --rank 1, ..., --rank 7 respectively.

ViT-Base, linear classification.

Run on single node:

python main_lincls.py \
  -a vit_base --lr=3 \
  --dist-url 'tcp://localhost:10001' \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
  [your imagenet-folder with train and val folders]

Below are our pre-trained ViT models and logs (batch 4096).

model	pretrain epochs	linear acc	pretrain files	linear files
ViT-Small	300	73.2	chpt	chpt / log
ViT-Base	300	76.7	chpt	chpt / log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONFIG.md

CONFIG.md

MoCo v3 Reference Setups and Models

ResNet-50 models

ViT Models

Files

CONFIG.md

Latest commit

History

CONFIG.md

File metadata and controls

MoCo v3 Reference Setups and Models

ResNet-50 models

ViT Models