Here we document the reference commands for pre-training and evaluating various MoCo v3 models.
With batch 4096, the training of all ResNet-50 models can fit into 2 nodes with a total of 16 Volta 32G GPUs.
ResNet-50, 100-epoch pre-training.
On the first node, run:
python main_moco.py \
--moco-m-cos --crop-min=.2 \
--dist-url 'tcp://[your first node address]:[specified port]' \
--multiprocessing-distributed --world-size 2 --rank 0 \
[your imagenet-folder with train and val folders]
On the second node, run the same command with --rank 1
.
ResNet-50, 300-epoch pre-training.
On the first node, run:
python main_moco.py \
--lr=.3 --epochs=300 \
--moco-m-cos --crop-min=.2 \
--dist-url 'tcp://[your first node address]:[specified port]' \
--multiprocessing-distributed --world-size 2 --rank 0 \
[your imagenet-folder with train and val folders]
On the second node, run the same command with --rank 1
.
ResNet-50, 1000-epoch pre-training.
On the first node, run:
python main_moco.py \
--lr=.3 --wd=1.5e-6 --epochs=1000 \
--moco-m=0.996 --moco-m-cos --crop-min=.2 \
--dist-url 'tcp://[your first node address]:[specified port]' \
--multiprocessing-distributed --world-size 2 --rank 0 \
[your imagenet-folder with train and val folders]
On the second node, run the same command with --rank 1
.
ResNet-50, linear classification.
Run on single node:
python main_lincls.py \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed --world-size 1 --rank 0 \
--pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
[your imagenet-folder with train and val folders]
Below are our pre-trained ResNet-50 models and logs.
pretrain epochs |
linear acc |
pretrain files |
linear files |
---|---|---|---|
100 | 68.9 | chpt | chpt / log |
300 | 72.8 | chpt | chpt / log |
1000 | 74.6 | chpt | chpt / log |
All ViT models are pre-trained for 300 epochs with AdamW.
ViT-Small, 1-node (8-GPU), 1024-batch pre-training.
This setup fits into a single node of 8 Volta 32G GPUs, for ease of debugging.
python main_moco.py \
-a vit_small -b 1024 \
--optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
--epochs=300 --warmup-epochs=40 \
--stop-grad-conv1 --moco-m-cos --moco-t=.2 \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed --world-size 1 --rank 0 \
[your imagenet-folder with train and val folders]
ViT-Small, 4-node (32-GPU) pre-training.
On the first node, run:
python main_moco.py \
-a vit_small \
--optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
--epochs=300 --warmup-epochs=40 \
--stop-grad-conv1 --moco-m-cos --moco-t=.2 \
--dist-url 'tcp://[your first node address]:[specified port]' \
--multiprocessing-distributed --world-size 8 --rank 0 \
[your imagenet-folder with train and val folders]
On other nodes, run the same command with --rank 1
, ..., --rank 3
respectively.
ViT-Small, linear classification.
Run on single node:
python main_lincls.py \
-a vit_small --lr=3 \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed --world-size 1 --rank 0 \
--pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
[your imagenet-folder with train and val folders]
ViT-Base, 8-node (64-GPU) pre-training.
python main_moco.py \
-a vit_base \
--optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
--epochs=300 --warmup-epochs=40 \
--stop-grad-conv1 --moco-m-cos --moco-t=.2 \
--dist-url 'tcp://[your first node address]:[specified port]' \
--multiprocessing-distributed --world-size 8 --rank 0 \
[your imagenet-folder with train and val folders]
On other nodes, run the same command with --rank 1
, ..., --rank 7
respectively.
ViT-Base, linear classification.
Run on single node:
python main_lincls.py \
-a vit_base --lr=3 \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed --world-size 1 --rank 0 \
--pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
[your imagenet-folder with train and val folders]
Below are our pre-trained ViT models and logs (batch 4096).
model | pretrain epochs |
linear acc |
pretrain files |
linear files |
---|---|---|---|---|
ViT-Small | 300 | 73.2 | chpt | chpt / log |
ViT-Base | 300 | 76.7 | chpt | chpt / log |