Skip to content

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, DPN, CSPNet, and more

License

Notifications You must be signed in to change notification settings

ltss1988/pytorch-image-models

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch Image Models

Sponsors

A big thank you to my GitHub Sponsors for their support!

In addition to the sponsors at the link above, I've received hardware and/or cloud resources from

I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.

What's New

March 17, 2021

  • Add new ECA-NFNet-L0 (rename nfnet_l0c->eca_nfnet_l0) weights trained by myself.
    • 82.6 top-1 @ 288x288, 82.8 @ 320x320, trained at 224x224
    • Uses SiLU activation, approx 2x faster than dm_nfnet_f0 and 50% faster than nfnet_f0s w/ 1/3 param count
  • Integrate Hugging Face model hub into timm create_model and default_cfg handling for pretrained weight and config sharing (more on this soon!)
  • Merge HardCoRe NAS models contributed by https://github.com/yoniaflalo
  • Merge PyTorch trained EfficientNet-EL and pruned ES/EL variants contributed by DeGirum

March 7, 2021

  • First 0.4.x PyPi release w/ NFNets (& related), ByoB (GPU-Efficient, RepVGG, etc).
  • Change feature extraction for pre-activation nets (NFNets, ResNetV2) to return features before activation.
  • Tested with PyTorch 1.8 release. Updated CI to use 1.8.
  • Benchmarked several arch on RTX 3090, Titan RTX, and V100 across 1.7.1, 1.8, NGC 20.12, and 21.02. Some interesting performance variations to take note of https://gist.github.com/rwightman/bb59f9e245162cee0e38bd66bd8cd77f

Feb 18, 2021

  • Add pretrained weights and model variants for NFNet-F* models from DeepMind Haiku impl.
    • Models are prefixed with dm_. They require SAME padding conv, skipinit enabled, and activation gains applied in act fn.
    • These models are big, expect to run out of GPU memory. With the GELU activiation + other options, they are roughly 1/2 the inference speed of my SiLU PyTorch optimized s variants.
    • Original model results are based on pre-processing that is not the same as all other models so you'll see different results in the results csv (once updated).
    • Matching the original pre-processing as closely as possible I get these results:
      • dm_nfnet_f6 - 86.352
      • dm_nfnet_f5 - 86.100
      • dm_nfnet_f4 - 85.834
      • dm_nfnet_f3 - 85.676
      • dm_nfnet_f2 - 85.178
      • dm_nfnet_f1 - 84.696
      • dm_nfnet_f0 - 83.464

Feb 16, 2021

  • Add Adaptive Gradient Clipping (AGC) as per https://arxiv.org/abs/2102.06171. Integrated w/ PyTorch gradient clipping via mode arg that defaults to prev 'norm' mode. For backward arg compat, clip-grad arg must be specified to enable when using train.py.
    • AGC w/ default clipping factor --clip-grad .01 --clip-mode agc
    • PyTorch global norm of 1.0 (old behaviour, always norm), --clip-grad 1.0
    • PyTorch value clipping of 10, --clip-grad 10. --clip-mode value
    • AGC performance is definitely sensitive to the clipping factor. More experimentation needed to determine good values for smaller batch sizes and optimizers besides those in paper. So far I've found .001-.005 is necessary for stable RMSProp training w/ NFNet/NF-ResNet.

Feb 12, 2021

Feb 10, 2021

  • First Normalization-Free model training experiments done,
    • nf_resnet50 - 80.68 top-1 @ 288x288, 80.31 @ 256x256
    • nf_regnet_b1 - 79.30 @ 288x288, 78.75 @ 256x256
  • More model archs, incl a flexible ByobNet backbone ('Bring-your-own-blocks')
  • Refinements to normalizer layer arg handling and normalizer+act layer handling in some models
  • Default AMP mode changed to native PyTorch AMP instead of APEX. Issues not being fixed with APEX. Native works with --channels-last and --torchscript model training, APEX does not.
  • Fix a few bugs introduced since last pypi release

Feb 8, 2021

  • Add several ResNet weights with ECA attention. 26t & 50t trained @ 256, test @ 320. 269d train @ 256, fine-tune @320, test @ 352.
    • ecaresnet26t - 79.88 top-1 @ 320x320, 79.08 @ 256x256
    • ecaresnet50t - 82.35 top-1 @ 320x320, 81.52 @ 256x256
    • ecaresnet269d - 84.93 top-1 @ 352x352, 84.87 @ 320x320
  • Remove separate tiered (t) vs tiered_narrow (tn) ResNet model defs, all tn changed to t and t models removed (seresnext26t_32x4d only model w/ weights that was removed).
  • Support model default_cfgs with separate train vs test resolution test_input_size and remove extra _320 suffix ResNet model defs that were just for test.

Jan 30, 2021

  • Add initial "Normalization Free" NF-RegNet-B* and NF-ResNet model definitions based on paper

Jan 25, 2021

  • Add ResNetV2 Big Transfer (BiT) models w/ ImageNet-1k and 21k weights from https://github.com/google-research/big_transfer
  • Add official R50+ViT-B/16 hybrid models + weights from https://github.com/google-research/vision_transformer
  • ImageNet-21k ViT weights are added w/ model defs and representation layer (pre logits) support
    • NOTE: ImageNet-21k classifier heads were zero'd in original weights, they are only useful for transfer learning
  • Add model defs and weights for DeiT Vision Transformer models from https://github.com/facebookresearch/deit
  • Refactor dataset classes into ImageDataset/IterableImageDataset + dataset specific parser classes
  • Add Tensorflow-Datasets (TFDS) wrapper to allow use of TFDS image classification sets with train script
    • Ex: train.py /data/tfds --dataset tfds/oxford_iiit_pet --val-split test --model resnet50 -b 256 --amp --num-classes 37 --opt adamw --lr 3e-4 --weight-decay .001 --pretrained -j 2
  • Add improved .tar dataset parser that reads images from .tar, folder of .tar files, or .tar within .tar
    • Run validation on full ImageNet-21k directly from tar w/ BiT model: validate.py /data/fall11_whole.tar --model resnetv2_50x1_bitm_in21k --amp
  • Models in this update should be stable w/ possible exception of ViT/BiT, possibility of some regressions with train/val scripts and dataset handling

Jan 3, 2021

  • Add SE-ResNet-152D weights
    • 256x256 val, 0.94 crop top-1 - 83.75
    • 320x320 val, 1.0 crop - 84.36
  • Update results files

Dec 18, 2020

  • Add ResNet-101D, ResNet-152D, and ResNet-200D weights trained @ 256x256
    • 256x256 val, 0.94 crop (top-1) - 101D (82.33), 152D (83.08), 200D (83.25)
    • 288x288 val, 1.0 crop - 101D (82.64), 152D (83.48), 200D (83.76)
    • 320x320 val, 1.0 crop - 101D (83.00), 152D (83.66), 200D (84.01)

Dec 7, 2020

  • Simplify EMA module (ModelEmaV2), compatible with fully torchscripted models
  • Misc fixes for SiLU ONNX export, default_cfg missing from Feature extraction models, Linear layer w/ AMP + torchscript
  • PyPi release @ 0.3.2 (needed by EfficientDet)

Oct 30, 2020

  • Test with PyTorch 1.7 and fix a small top-n metric view vs reshape issue.
  • Convert newly added 224x224 Vision Transformer weights from official JAX repo. 81.8 top-1 for B/16, 83.1 L/16.
  • Support PyTorch 1.7 optimized, native SiLU (aka Swish) activation. Add mapping to 'silu' name, custom swish will eventually be deprecated.
  • Fix regression for loading pretrained classifier via direct model entrypoint functions. Didn't impact create_model() factory usage.
  • PyPi release @ 0.3.0 version!

Oct 26, 2020

  • Update Vision Transformer models to be compatible with official code release at https://github.com/google-research/vision_transformer
  • Add Vision Transformer weights (ImageNet-21k pretrain) for 384x384 base and large models converted from official jax impl
    • ViT-B/16 - 84.2
    • ViT-B/32 - 81.7
    • ViT-L/16 - 85.2
    • ViT-L/32 - 81.5

Oct 21, 2020

  • Weights added for Vision Transformer (ViT) models. 77.86 top-1 for 'small' and 79.35 for 'base'. Thanks to Christof for training the base model w/ lots of GPUs.

Oct 13, 2020

  • Initial impl of Vision Transformer models. Both patch and hybrid (CNN backbone) variants. Currently trying to train...
  • Adafactor and AdaHessian (FP32 only, no AMP) optimizers
  • EdgeTPU-M (efficientnet_em) model trained in PyTorch, 79.3 top-1
  • Pip release, doc updates pending a few more changes...

Sept 18, 2020

  • New ResNet 'D' weights. 72.7 (top-1) ResNet-18-D, 77.1 ResNet-34-D, 80.5 ResNet-50-D
  • Added a few untrained defs for other ResNet models (66D, 101D, 152D, 200/200D)

Sept 3, 2020

  • New weights
    • Wide-ResNet50 - 81.5 top-1 (vs 78.5 torchvision)
    • SEResNeXt50-32x4d - 81.3 top-1 (vs 79.1 cadene)
  • Support for native Torch AMP and channels_last memory format added to train/validate scripts (--channels-last, --native-amp vs --apex-amp)
  • Models tested with channels_last on latest NGC 20.08 container. AdaptiveAvgPool in attn layers changed to mean((2,3)) to work around bug with NHWC kernel.

Introduction

PyTorch Image Models (timm) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.

The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code docstrings. Please let me know if I missed anything.

Models

All model architecture families include variants with pretrained weights. There are specific model variants without any weights, it is NOT a bug. Help training new or better weights is always appreciated. Here are some example training hparams to get you started.

A full version of the list below with source links can be found in the documentation.

Features

Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:

Results

Model validation results can be found in the documentation and in the results tables

Getting Started (Documentation)

My current documentation for timm covers the basics.

timmdocs is quickly becoming a much more comprehensive set of documentation for timm. A big thanks to Aman Arora for his efforts creating timmdocs.

paperswithcode is a good resource for browsing the models within timm.

Train, Validation, Inference Scripts

The root folder of the repository contains reference train, validation, and inference scripts that work with the included models and other features of this repository. They are adaptable for other datasets and use cases with a little hacking. See documentation for some basics and training hparams for some train examples that produce SOTA ImageNet results.

Awesome PyTorch Resources

One of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and componenets here are listed below.

Object Detection, Instance and Semantic Segmentation

Computer Vision / Image Augmentation

Knowledge Distillation

Metric Learning

Training / Frameworks

Licenses

Code

The code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL / LGPL conflicts. That said, it is your responsibility to ensure you comply with license here and conditions of any dependent licenses. Where applicable, I've linked the sources/references for various components in docstrings. If you think I've missed anything please create an issue.

Pretrained Weights

So far all of the pretrained weights available here are pretrained on ImageNet with a select few that have some additional pretraining (see extra note below). ImageNet was released for non-commercial research purposes only (http://www.image-net.org/download-faq). It's not clear what the implications of that are for the use of pretrained weights from that dataset. Any models I have trained with ImageNet are done for research purposes and one should assume that the original dataset license applies to the weights. It's best to seek legal advice if you intend to use the pretrained weights in a commercial product.

Pretrained on more than ImageNet

Several weights included or references here were pretrained with proprietary datasets that I do not have access to. These include the Facebook WSL, SSL, SWSL ResNe(Xt) and the Google Noisy Student EfficientNet models. The Facebook models have an explicit non-commercial license (CC-BY-NC 4.0, https://github.com/facebookresearch/semi-supervised-ImageNet1K-models, https://github.com/facebookresearch/WSL-Images). The Google models do not appear to have any restriction beyond the Apache 2.0 license (and ImageNet concerns). In either case, you should contact Facebook or Google with any questions.

Citing

BibTeX

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
}

Latest DOI

DOI

About

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, DPN, CSPNet, and more

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%