-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion regarding the number of epochs the HIPT model was pre-trained for #8
Comments
Hi @ramprs21 You are right that the model is essentially trained with "1 epoch". We reported training in terms of iterations to avoid confusion, but seems that reporting iterations can also be confusing! An erratum we may also make to the arXiv is that in the 1st paragraph on page 6, the warm-up was reported in terms of epochs still (in reality - first 40,000 iterations were used). The commands I provided in the README do not reflect the hyper-parameters used, and will make an update soon! |
Thanks for the clarification @Richarizardd :) |
Hi Richard, Could you please let us know a little bit more about the training set up (# of GPUs and types) and how long it took to pre-train? Thanks |
Hi @ramprs21 - Thank you for the note. I will reflect it soon in the README. Pretraining required 2-4x A100s (for batch size of 256), and took ~two weeks. To comment on DINO, a great thing I have found about DINO is how data-efficient and generous it is w.r.t. low batch sizes (see ablation experiments on the last page), in contrast with other pretraining methods (SimCLR, MAE) that report results with batch sizes of 1024-2048. I imagine that even low batch sizes (and as CPATH images have less variation than natural images), DINO would perform well. |
Thanks @Richarizardd . That makes sense. Just to clarify on your previous comment where you mentioned that the model was trained for just 1 epoch -- The default value for |
Dear @Richarizardd and @faisalml , first congratulations for this huge and very promising work. Making the code public is very appreciated.
Also, could you give us an update regarding @ramprs21 comment ? I guess you are not performing warming epochs neither as you are training on 1 epoch ? Thanks you very much ! Have a great day |
Thank you for the great work and for sharing the code.
The paper mentions that the model was trained for 400K iterations with batch size of 256 which amounts to 102,400,000 patches, which seems to be about the same as the size of the dataset used for pretraining. So it seems like the model was trained for just 1 epoch, but the training script in the README
python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /path/to/TCGA_PRETRAINING_DIR/patch_256_pretraining/ --output_dir /path/to/TCGA_PRETRAINING_DIR/ckpts/pretrain/ --epochs 100
seems to suggest that it was pretrained for 100 epochs. Could you please clarify this detail? Thanks in advance.
The text was updated successfully, but these errors were encountered: