Ersi lig 3910 update mae benchmark code #1468

ersi-lightly · 2024-01-08T18:14:15Z

Updates the MAE benchmark code to use the new TIMM ViT backbone.

codecov · 2024-01-08T18:18:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f00d320) 84.61% compared to head (a583747) 84.37%.

❗ Current head a583747 differs from pull request most recent head 034a6bd. Consider uploading reports for the commit 034a6bd to get more accurate results

Additional details and impacted files

@@                              Coverage Diff                               @@
##           ersi-lig-3912-refactor-mae-to-use-timm-vit    #1468      +/-   ##
==============================================================================
- Coverage                                       84.61%   84.37%   -0.25%     
==============================================================================
  Files                                             136      134       -2     
  Lines                                            5799     5690     -109     
==============================================================================
- Hits                                             4907     4801     -106     
+ Misses                                            892      889       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ig-3910-update-mae-benchmark-code

ersi-lightly · 2024-01-16T12:51:11Z

Imagenet

max val_online_cls_top1: 0.21077999472618103
max val_online_cls_top5: 0.4205799996852875
knn val_top1: 0.1473200023174286
knn val_top5: 0.3243800103664398
max linear val_top1: 0.47617998719215393                                  
max linear val_top5: 0.7143200039863586
max finetune val_top1: 0.8150799870491028
max finetune val_top5: 0.9550600051879883

guarin · 2024-01-17T10:31:59Z

benchmarks/imagenet/vitb16/mae.py

@@ -21,24 +22,25 @@ def __init__(self, batch_size_per_device: int, num_classes: int) -> None:
        self.batch_size_per_device = batch_size_per_device

        decoder_dim = 512
-        vit = vit_b_16()
+        vit = vit_base_patch16_224(dynamic_img_size=True)


Do we have to set dynamic_img_size=True? I think MAE uses 16x16 patches with 224px images everywhere.

guarin · 2024-01-17T10:32:12Z

benchmarks/imagenet/vitb16/mae.py

-        self.patch_size = vit.patch_size
-        self.sequence_length = vit.seq_length
+        self.patch_size = vit.patch_embed.patch_size[0]
+        self.sequence_length = vit.patch_embed.num_patches + 1


Suggested change

self.sequence_length = vit.patch_embed.num_patches + 1

self.sequence_length = vit.patch_embed.num_patches + vit.num_prefix_tokens

guarin · 2024-01-17T10:33:36Z

benchmarks/imagenet/vitb16/mae.py

+        self.decoder = masked_autoencoder_timm.MAEDecoder(
+            num_patches=vit.patch_embed.num_patches,
+            patch_size=self.patch_size,
+            in_chans=3,


Let's not set in_chans here, it should always be 3 for RGB images.

* added hackathon * changed comments * formatted * addressed comments * fixed typing * addressed comments * added pre-norm and fixed arguments * added masked vision transformer with Torchvision * weight initialization * cleanup * modifies imagenette benchmark * made mask token optional and adapted benchmarks * removed unused import * adapted to dynamic image size * moved positional embed init to utils * updated benchmark * adapted benchmark * moved mask token to decoder * revert example * removed example * removed file * inheriting from Module * reverted dataset paths * use timm's drop_path_rate * removed unused import * removed private method * changed slicing * formatted * path dropout only for fine tune * formatted * account for mask token in backbone * mask token of decoder * removed appending of mask token in params

* Add MAE evaluation * Add stochastic depth dropout * Add MAE * Drop assertion * Fix smooth cross entropy loss and mixup * Update comments * Add layer lr decay and weight decay * Update comment * Add test for MAE images_to_tokens * Disable BN update * Add BN before classification head * Format * Fix BN freezing * Cleanup * Use torch.no_grad instead of deactivating gradients manually * This is required as torch.no_grad doesn't change the model configuration while manual gradient deactivation/activation can have unintended consequences. For example, MAE ViT positional embeddings are parameters with requires_grad=False that should never receive an update. But if we use activate_requires_grad for finetuning we break those parameters. * Create new stochastic depth instances * Add mask token to learnable params * Add sine-cosine positional embedding * Initialize parameters as in paper * Fix types * Format * adjusted to existing interface * draft * remove * added modifications * added mae implementation with timm and example * formatted * fixed import * removed * fixed typing * addressed comments * fixed typing and formatted * addressed comments * added docstring and formatted * removed images to tokens method * Ersi lig 3910 update mae benchmark code (#1468) * modified imagenette benchmark * formatted * edited vitb16 benchmark * added the posibility to handle images of different sizes * formatted * removed comments * revert * changed import * initialize class token * specified that class token should be used * chabged architecture * addressed comments * formatted * Masked vision transformer (#1482) * added hackathon * changed comments * formatted * addressed comments * fixed typing * addressed comments * added pre-norm and fixed arguments * added masked vision transformer with Torchvision * weight initialization * cleanup * modifies imagenette benchmark * made mask token optional and adapted benchmarks * removed unused import * adapted to dynamic image size * moved positional embed init to utils * updated benchmark * adapted benchmark * moved mask token to decoder * revert example * removed example * removed file * inheriting from Module * reverted dataset paths * use timm's drop_path_rate * removed unused import * removed private method * changed slicing * formatted * path dropout only for fine tune * formatted * account for mask token in backbone * mask token of decoder * removed appending of mask token in params * resolved conflicts * formatted * adjusted examples * removed comment * added test * added message in case of ImportError * fixed skipping of test * removed example * handling the TIMM dependency * added note to docs for MAE installation * added unit tests for MAE with torchvision * removed unecessary maks token definition * addressed comments * moved test to separate file * added typing * fixed import * fixes typing * fixed typing * fixed typing * Ersi lig 4471 cleanup and merge mae branch (#1510) * renamed test class * fixed imports * ficed imports * fixed import * fixed imports and decreased batch size * format * removed comments * use function defined in utils * added docstrings * added doctrings * added docstring * formatted * formatted * import Tensor --------- Co-authored-by: guarin <[email protected]>

ersi-lightly added 2 commits January 8, 2024 18:10

modified imagenette benchmark

af9b76a

formatted

a0639a7

ersi-lightly added 10 commits January 8, 2024 19:27

edited vitb16 benchmark

57762b8

Merge branch 'ersi-lig-3912-refactor-mae-to-use-timm-vit' into ersi-l…

b5b0ab5

…ig-3910-update-mae-benchmark-code

added the posibility to handle images of different sizes

a79738e

formatted

9564748

removed comments

e30ca7b

revert

43e48d4

changed import

7c1e477

initialize class token

658b1fd

specified that class token should be used

304bbb6

chabged architecture

90aa4f7

ersi-lightly changed the base branch from master to ersi-lig-3912-refactor-mae-to-use-timm-vit January 17, 2024 08:11

guarin reviewed Jan 17, 2024

View reviewed changes

ersi-lightly added 2 commits January 19, 2024 09:18

addressed comments

d795b68

formatted

a583747

guarin mentioned this pull request Jan 24, 2024

Guarin lig 3056 add mae imagenet benchmark #1263

Closed

ersi-lightly marked this pull request as ready for review February 23, 2024 17:17

ersi-lightly merged commit cc263fe into ersi-lig-3912-refactor-mae-to-use-timm-vit Feb 23, 2024
4 of 8 checks passed

ersi-lightly deleted the ersi-lig-3910-update-mae-benchmark-code branch February 23, 2024 17:31

guarin mentioned this pull request Aug 16, 2024

MAE #1255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ersi lig 3910 update mae benchmark code #1468

Ersi lig 3910 update mae benchmark code #1468

ersi-lightly commented Jan 8, 2024

codecov bot commented Jan 8, 2024 •

edited

Loading

ersi-lightly commented Jan 16, 2024

guarin Jan 17, 2024

guarin Jan 17, 2024

guarin Jan 17, 2024

	self.sequence_length = vit.patch_embed.num_patches + 1
	self.sequence_length = vit.patch_embed.num_patches + vit.num_prefix_tokens

Ersi lig 3910 update mae benchmark code #1468

Ersi lig 3910 update mae benchmark code #1468

Conversation

ersi-lightly commented Jan 8, 2024

codecov bot commented Jan 8, 2024 • edited Loading

Codecov Report

ersi-lightly commented Jan 16, 2024

guarin Jan 17, 2024

Choose a reason for hiding this comment

guarin Jan 17, 2024

Choose a reason for hiding this comment

guarin Jan 17, 2024

Choose a reason for hiding this comment

codecov bot commented Jan 8, 2024 •

edited

Loading