This repository provides a PyTorch implementation of the POPAR: Patch Order Prediction and Appearance Recovery for Self-supervised Medical Image Analysis.
We propose POPAR (patch order prediction and appearance recovery), a novel vision transformer-based self-supervised learning framework for chest X-ray images. POPAR leverages the benefits of vision transformers and unique properties of medical imaging, aiming to simultaneously learn patch-wise high-level contextual features by correcting shuffled patch orders and fine-grained features by recovering patch appearance.
POPAR: Patch Order Prediction and Appearance Recovery for Self-supervised Medical Image Analysis
Jiaxuan Pang1, Fatemeh Haghighi1,DongAo Ma1,Nahid Ul Islam1,Mohammad Reza Hosseinzadeh Taher1, Michael B. Gotway2, Jianming Liang1
1 Arizona State University, 2 Mayo Clinic
Published in: Domain Adaptation and Representation Transfer (DART), 2022.
Paper | Supplementary material | Code | [Poster] | [Slides] | Presentation ([YouTube])
- POPAR consistently outperforms all state-of-the-art transformer-based self-supervised imagenet pretrained models that are publicly available.
- Our downgraded POPAR-1 and POPAR-3 outperform or achieve on-par performance on most target tasks comapring with all state-of-the-art transformer-based self-supervised imagenet pretrained models that are publicly available.
- POPAR with Swin-base backbone, even our downgraded version yields significantly better or on-par performance compared with three self-supervised learning methods with ResNet-50 backbone in all target tasks.
- POPAR models outperform SimMIM in all target tasks across ViT-base and Swin-base backbones.
- POPAR models outperform fully supervised pretrained models on ImageNet and ChestX-ray14 datasets across architectures
- Python
- Install PyTorch (pytorch.org)
Our pre-trained ViT and Swin Transformer models can be downloaded as following:
Backbone | Input Resolution (Shuffled Patches) | AUC on ChestX-ray14 | AUC on CheXpert | AUC on ShenZhen | ACC on RSNA Pneumonia | Model | |
---|---|---|---|---|---|---|---|
POPAR-3 | ViT-B | 224x224 (196) | 79.58±0.13 | 87.86±0.17 | 93.87±0.63 | 73.17±0.46 | download |
POPAR | Swin-B | 448x448 (196) | 81.81±0.10 | 88.34±0.50 | 97.33±0.74 | 74.19±0.37 | download |
This research has been supported in part by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, and in part by the NIH under Award Number R01HL128785. The content is solely the responsi- bility of the authors and does not necessarily represent the official views of the NIH. This work has utilized the GPUs provided in part by the ASU Research Computing and in part by the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF) under grant numbers: ACI-1548562, ACI-1928147, and ACI-2005632. The content of this paper is covered by patents pending.
Released under the ASU GitHub Project License.