Confusion regarding the number of epochs the HIPT model was pre-trained for #8

ramprs21 · 2022-07-15T22:02:49Z

Thank you for the great work and for sharing the code.

The paper mentions that the model was trained for 400K iterations with batch size of 256 which amounts to 102,400,000 patches, which seems to be about the same as the size of the dataset used for pretraining. So it seems like the model was trained for just 1 epoch, but the training script in the README

python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /path/to/TCGA_PRETRAINING_DIR/patch_256_pretraining/ --output_dir /path/to/TCGA_PRETRAINING_DIR/ckpts/pretrain/ --epochs 100

seems to suggest that it was pretrained for 100 epochs. Could you please clarify this detail? Thanks in advance.

The text was updated successfully, but these errors were encountered:

Richarizardd · 2022-07-15T23:05:56Z

Hi @ramprs21

You are right that the model is essentially trained with "1 epoch". We reported training in terms of iterations to avoid confusion, but seems that reporting iterations can also be confusing!

An erratum we may also make to the arXiv is that in the 1st paragraph on page 6, the warm-up was reported in terms of epochs still (in reality - first 40,000 iterations were used).

The commands I provided in the README do not reflect the hyper-parameters used, and will make an update soon!

ramprs21 · 2022-07-15T23:07:17Z

Thanks for the clarification @Richarizardd :)

ramprs21 · 2022-07-18T18:25:25Z

Hi Richard, Could you please let us know a little bit more about the training set up (# of GPUs and types) and how long it took to pre-train? Thanks

Richarizardd · 2022-07-21T22:05:27Z

Hi @ramprs21 - Thank you for the note. I will reflect it soon in the README. Pretraining required 2-4x A100s (for batch size of 256), and took ~two weeks.

To comment on DINO, a great thing I have found about DINO is how data-efficient and generous it is w.r.t. low batch sizes (see ablation experiments on the last page), in contrast with other pretraining methods (SimCLR, MAE) that report results with batch sizes of 1024-2048. I imagine that even low batch sizes (and as CPATH images have less variation than natural images), DINO would perform well.

ramprs21 · 2022-07-21T22:19:58Z

Thanks @Richarizardd . That makes sense.

Just to clarify on your previous comment where you mentioned that the model was trained for just 1 epoch --

The default value for freeze_last_layer is 1 here, which means the last layer is frozen during the first epoch. Wondering if this should have been set to 0 instead?

afilt · 2022-10-11T16:02:39Z

Hi @ramprs21 - Thank you for the note. I will reflect it soon in the README. Pretraining required 2-4x A100s (for batch size of 256), and took ~two weeks.

To comment on DINO, a great thing I have found about DINO is how data-efficient and generous it is w.r.t. low batch sizes (see ablation experiments on the last page), in contrast with other pretraining methods (SimCLR, MAE) that report results with batch sizes of 1024-2048. I imagine that even low batch sizes (and as CPATH images have less variation than natural images), DINO would perform well.

Dear @Richarizardd and @faisalml , first congratulations for this huge and very promising work. Making the code public is very appreciated.
To clarify your previous comment @Richarizardd, was your effective batch size equal to 2x4x256=2048 (2 nodes x 4 GPUs x 256 images per gpu), or instead 2x4x32=256 (2 nodes x 4 GPUs x 64 images per gpu) ? In the latter case, the default value in the parser should be 32 and not 64... ?

Thanks @Richarizardd . That makes sense.

Just to clarify on your previous comment where you mentioned that the model was trained for just 1 epoch --

The default value for freeze_last_layer is 1 here, which means the last layer is frozen during the first epoch. Wondering if this should have been set to 0 instead?

Also, could you give us an update regarding @ramprs21 comment ? I guess you are not performing warming epochs neither as you are training on 1 epoch ? Thanks you very much ! Have a great day ☺️

ramprs21 closed this as completed Jul 15, 2022

ramprs21 reopened this Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion regarding the number of epochs the HIPT model was pre-trained for #8

Confusion regarding the number of epochs the HIPT model was pre-trained for #8

ramprs21 commented Jul 15, 2022

Richarizardd commented Jul 15, 2022

ramprs21 commented Jul 15, 2022

ramprs21 commented Jul 18, 2022

Richarizardd commented Jul 21, 2022

ramprs21 commented Jul 21, 2022

afilt commented Oct 11, 2022 •

edited

Loading

Confusion regarding the number of epochs the HIPT model was pre-trained for #8

Confusion regarding the number of epochs the HIPT model was pre-trained for #8

Comments

ramprs21 commented Jul 15, 2022

Richarizardd commented Jul 15, 2022

ramprs21 commented Jul 15, 2022

ramprs21 commented Jul 18, 2022

Richarizardd commented Jul 21, 2022

ramprs21 commented Jul 21, 2022

afilt commented Oct 11, 2022 • edited Loading

afilt commented Oct 11, 2022 •

edited

Loading