Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion regarding the number of epochs the HIPT model was pre-trained for #8

Open
ramprs21 opened this issue Jul 15, 2022 · 6 comments

Comments

@ramprs21
Copy link

Thank you for the great work and for sharing the code.

The paper mentions that the model was trained for 400K iterations with batch size of 256 which amounts to 102,400,000 patches, which seems to be about the same as the size of the dataset used for pretraining. So it seems like the model was trained for just 1 epoch, but the training script in the README

python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /path/to/TCGA_PRETRAINING_DIR/patch_256_pretraining/ --output_dir /path/to/TCGA_PRETRAINING_DIR/ckpts/pretrain/ --epochs 100

seems to suggest that it was pretrained for 100 epochs. Could you please clarify this detail? Thanks in advance.

@Richarizardd
Copy link
Collaborator

Hi @ramprs21

You are right that the model is essentially trained with "1 epoch". We reported training in terms of iterations to avoid confusion, but seems that reporting iterations can also be confusing!

An erratum we may also make to the arXiv is that in the 1st paragraph on page 6, the warm-up was reported in terms of epochs still (in reality - first 40,000 iterations were used).

The commands I provided in the README do not reflect the hyper-parameters used, and will make an update soon!

@ramprs21
Copy link
Author

Thanks for the clarification @Richarizardd :)

@ramprs21
Copy link
Author

Hi Richard, Could you please let us know a little bit more about the training set up (# of GPUs and types) and how long it took to pre-train? Thanks

@ramprs21 ramprs21 reopened this Jul 18, 2022
@Richarizardd
Copy link
Collaborator

Hi @ramprs21 - Thank you for the note. I will reflect it soon in the README. Pretraining required 2-4x A100s (for batch size of 256), and took ~two weeks.

To comment on DINO, a great thing I have found about DINO is how data-efficient and generous it is w.r.t. low batch sizes (see ablation experiments on the last page), in contrast with other pretraining methods (SimCLR, MAE) that report results with batch sizes of 1024-2048. I imagine that even low batch sizes (and as CPATH images have less variation than natural images), DINO would perform well.

@ramprs21
Copy link
Author

Thanks @Richarizardd . That makes sense.

Just to clarify on your previous comment where you mentioned that the model was trained for just 1 epoch --

The default value for freeze_last_layer is 1 here, which means the last layer is frozen during the first epoch. Wondering if this should have been set to 0 instead?

@afilt
Copy link

afilt commented Oct 11, 2022

Hi @ramprs21 - Thank you for the note. I will reflect it soon in the README. Pretraining required 2-4x A100s (for batch size of 256), and took ~two weeks.

To comment on DINO, a great thing I have found about DINO is how data-efficient and generous it is w.r.t. low batch sizes (see ablation experiments on the last page), in contrast with other pretraining methods (SimCLR, MAE) that report results with batch sizes of 1024-2048. I imagine that even low batch sizes (and as CPATH images have less variation than natural images), DINO would perform well.

Dear @Richarizardd and @faisalml , first congratulations for this huge and very promising work. Making the code public is very appreciated.
To clarify your previous comment @Richarizardd, was your effective batch size equal to 2x4x256=2048 (2 nodes x 4 GPUs x 256 images per gpu), or instead 2x4x32=256 (2 nodes x 4 GPUs x 64 images per gpu) ? In the latter case, the default value in the parser should be 32 and not 64... ?

Thanks @Richarizardd . That makes sense.

Just to clarify on your previous comment where you mentioned that the model was trained for just 1 epoch --

The default value for freeze_last_layer is 1 here, which means the last layer is frozen during the first epoch. Wondering if this should have been set to 0 instead?

Also, could you give us an update regarding @ramprs21 comment ? I guess you are not performing warming epochs neither as you are training on 1 epoch ? Thanks you very much ! Have a great day ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants