(back to README.md for other categories)

Overview

Other High-level Vision Tasks
Transfer / X-Supervised / X-Shot / Continual Learning
Low-level Vision Tasks
Reinforcement Learning
- Navigation
- Other RL Tasks
Medical
Other Tasks
Attention Mechanisms in Vision/NLP
- Attention for Vision
- NLP
- Both
- Others

Other High-level Vision Tasks

Point Cloud / 3D

PCT: "PCT: Point Cloud Transformer", arXiv, 2020 (Tsinghua). [Paper][Jittor][PyTorch (uyzhang)]
Point-Transformer: "Point Transformer", arXiv, 2020 (Ulm University). [Paper]
NDT-Transformer: "NDT-Transformer: Large-Scale 3D Point Cloud Localisation using the Normal Distribution Transform Representation", ICRA, 2021 (University of Sheffield). [Paper][PyTorch]
P4Transformer: "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos", CVPR, 2021 (NUS). [Paper]
PTT: "PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds", IROS, 2021 (Northeastern University). [Paper][PyTorch (in construction)]
SnowflakeNet: "SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer", ICCV, 2021 (Tsinghua). [Paper][PyTorch]
PoinTr: "PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers", ICCV, 2021 (Tsinghua). [Paper][PyTorch]
Point-Transformer: "Point Transformer", ICCV, 2021 (Oxford + CUHK). [Paper][PyTorch (lucidrains)]
CT: "Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks", ICCV, 2021 (Samsung). [Paper]
3DVG-Transformer: "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds", ICCV, 2021 (Beihang University). [Paper]
PPT-Net: "Pyramid Point Cloud Transformer for Large-Scale Place Recognition", ICCV, 2021 (Nanjing University of Science and Technology). [Paper]
LTTR: "3D Object Tracking with Transformer", BMVC, 2021 (Northeastern University, China). [Paper][Code (in construction)]
?: "Shape registration in the time of transformers", NeurIPS, 2021 (Sapienza University of Rome). [Paper]
YOGO: "You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module", arXiv, 2021 (Berkeley). [Paper][PyTorch]
DTNet: "Dual Transformer for Point Cloud Analysis", arXiv, 2021 (Southwest University). [Paper]
MLMSPT: "Point Cloud Learning with Transformer", arXiv, 2021 (Southwest University). [Paper]
PQ-Transformer: "PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds", arXiv, 2021 (Tsinghua). [Paper][PyTorch]
PST²: "Spatial-Temporal Transformer for 3D Point Cloud Sequences", WACV, 2022 (Sun Yat-sen University). [Paper]
SCTN: "SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation", AAAI, 2022 (KAUST). [Paper]
AWT-Net: "Adaptive Wavelet Transformer Network for 3D Shape Representation Learning", ICLR, 2022 (NYU). [Paper]
?: "Deep Point Cloud Reconstruction", ICLR, 2022 (KAIST). [Paper]
HiTPR: "HiTPR: Hierarchical Transformer for Place Recognition in Point Cloud", ICRA, 2022 (Nanjing University of Science and Technology). [Paper]
FastPointTransformer: "Fast Point Transformer", CVPR, 2022 (POSTECH). [Paper]
REGTR: "REGTR: End-to-end Point Cloud Correspondences with Transformers", CVPR, 2022 (NUS, Singapore). [Paper][PyTorch]
ShapeFormer: "ShapeFormer: Transformer-based Shape Completion via Sparse Representation", CVPR, 2022 (Shenzhen University). [Paper][Website]
PatchFormer: "PatchFormer: An Efficient Point Transformer with Patch Attention", CVPR, 2022 (Hangzhou Dianzi University). [Paper]
?: "An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation", CVPR, 2022 (NTU + NYCU). [Paper][Code (in construction)]
Point-BERT: "Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling", CVPR, 2022 (Tsinghua). [Paper][PyTorch][Website]
PTTR: "PTTR: Relational 3D Point Cloud Object Tracking with Transformer", CVPR, 2022 (Sensetime). [Paper][PyTorch]
GeoTransformer: "Geometric Transformer for Fast and Robust Point Cloud Registration", CVPR, 2022 (National University of Defense Technology, China). [Paper][PyTorch]
PointCLIP: "PointCLIP: Point Cloud Understanding by CLIP", CVPR, 2022 (Shanghai AI Lab). [Paper][PyTorch]
?: "3D Part Assembly Generation with Instance Encoded Transformer", IROS, 2022 (Tongji University). [Paper]
SeedFormer: "SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer", ECCV, 2022 (Tencent). [Paper][PyTorch]
MeshMAE: "MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis", ECCV, 2022 (JD). [Paper]
PPTr: "Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding", ECCV, 2022 (Tsinghua University). [Paper]
Geodesic-Former: "Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter", ECCV, 2022 (VinAI Research, Vietnam). [Paper]
LaplacianMesh-Transformer: "Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation", ECCV, 2022 (CAS). [Paper]
Point-MixSwap: "Point MixSwap: Attentional Point Cloud Mixing via Swapping Matched Structural Divisions", ECCV, 2022 (NYCU + NTU). [Paper][PyTorch]
PTT: "Real-time 3D Single Object Tracking with Transformer", TMM, 2022 (Northeastern University, China). [Paper][PyTorch]
Point-Transformer-V2: "Point Transformer V2: Grouped Vector Attention and Partition-based Pooling", NeurIPS, 2022 (HKU). [Paper][PyTorch (in construction)]
SPoVT: "SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion", NeurIPS, 2022 (NTU). [Paper]
GSA: "Geodesic Self-Attention for 3D Point Clouds", NeurIPS, 2022 (East China Normal University). [Paper]
P2P: "P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting", NeurIPS, 2022 (Tsinghua University). [Paper][PyTorch][Website]
3DTRL: "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space", NeurIPS, 2022 (Stony Brook). [Paper][PyTorch][Website]
ShapeCrafter: "ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model", NeurIPS, 2022 (Brown). [Paper]
XMFnet: "Cross-modal Learning for Image-Guided Point Cloud Shape Completion", NeurIPS, 2022 (Politecnico di Torino, Italy). [Paper]
LighTN: "LighTN: Light-weight Transformer Network for Performance-overhead Tradeoff in Point Cloud Downsampling", arXiv, 2022 (Beijing Jiaotong University). [Paper]
PMP-Net++: "PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-step Point Moving Paths", arXiv, 2022 (Tsinghua). [Paper]
SnowflakeNet: "Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer", arXiv, 2022 (Tsinghua). [Paper][PyTorch]
3DCTN: "3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification", arXiv, 2022 (University of Waterloo, Canada). [Paper]
VNT-Net: "VNT-Net: Rotational Invariant Vector Neuron Transformers", arXiv, 2022 (Ben-Gurion University of the Negev, Israel). [Paper]
CompleteDT: "CompleteDT: Point Cloud Completion with Dense Augment Inference Transformers", arXiv, 2022 (Beijing Institute of Technology). [Paper]
VN-Transformer: "VN-Transformer: Rotation-Equivariant Attention for Vector Neurons", arXiv, 2022 (Waymo). [Paper]
Voxel-MAE: "Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds", arXiv, 2022 (Chalmers University of Technology, Sweden). [Paper]
MAE3D: "Masked Autoencoders in 3D Point Cloud Representation Learning", arXiv, 2022 (Northwest A&F University, China). [Paper]
PointConvFormer: "PointConvFormer: Revenge of the Point-based Convolution", arXiv, 2022 (Apple). [Paper]
PTTR++: "Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer", arXiv, 2022 (NTU, Singapore). [Paper][PyTorch]
Pix4Point: "Pix4Point: Image Pretrained Transformers for 3D Point Cloud Understanding", arXiv, 2022 (KAUST). [Paper][Code (in construction)]
MVP: "Multiple View Performers for Shape Completion", arXiv, 2022 (Columbia University). [Paper]
Simple3D-Former: "Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?", arXiv, 2022 (UT Austin). [Paper][PyTorch]
3DPCT: "3DPCT: 3D Point Cloud Transformer with Dual Self-attention", arXiv, 2022 (University of Waterloo, Canada). [Paper]
PS-Former: "Point Cloud Recognition with Position-to-Structure Attention Transformers", arXiv, 2022 (UCSD). [Paper]
LCPFormer: "LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers", arXiv, 2022 (Aberystwyth University, UK). [Paper]
PointCLIP-V2: "PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning", arXiv, 2022 (CUHK). [Paper][Code (in construction)]
R²-MLP: "R²-MLP: Round-Roll MLP for Multi-View 3D Object Recognition", arXiv, 2022 (Baidu). [Paper]
PVT3D: "PVT3D: Point Voxel Transformers for Place Recognition from Sparse Lidar Scans", arXiv, 2022 (TUM). [Paper]
PartSLIP: "PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models", arXiv, 2022 (Qualcomm). [Paper]
EPCL: "Frozen CLIP Model is Efficient Point Cloud Backbone", arXiv, 2022 (Shanghai AI Lab). [Paper]
ULIP: "ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding", arXiv, 2022 (Salesforce). [Paper][Website]

[Back to Overview]

Pose Estimation

Human-body:
- HOT-Net: "HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation", ACMMM. 2020 (Kwai). [Paper]
- TransPose: "TransPose: Towards Explainable Human Pose Estimation by Transformer", arXiv, 2020 (Southeast University). [Paper][PyTorch]
- PTF: "Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration", CVPR, 2021 (ETHZ). [Paper][Code (in construction)][Website]
- METRO: "End-to-End Human Pose and Mesh Reconstruction with Transformers", CVPR, 2021 (Microsoft). [Paper][PyTorch]
- PRTR: "Pose Recognition with Cascade Transformers", CVPR, 2021 (UCSD). [Paper][PyTorch]
- Mesh-Graphormer: "Mesh Graphormer", ICCV, 2021 (Microsoft). [Paper][PyTorch]
- THUNDR: "THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers", ICCV, 2021 (Google). [Paper]
- PoseFormer: "3D Human Pose Estimation with Spatial and Temporal Transformers", ICCV, 2021 (UNC). [Paper][PyTorch]
- TransPose: "TransPose: Keypoint Localization via Transformer", ICCV, 2021 (Southeast University, China). [Paper][PyTorch]
- POTR: "Pose Transformers (POTR): Human Motion Prediction With Non-Autoregressive Transformers", ICCVW, 2021 (Idiap). [Paper]
- TransFusion: "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation", BMVC, 2021 (UC Irvine). [Paper][PyTorch]
- HRT: "HRFormer: High-Resolution Transformer for Dense Prediction", NeurIPS, 2021 (CAS). [Paper][PyTorch]
- POET: "End-to-End Trainable Multi-Instance Pose Estimation with Transformers", arXiv, 2021 (EPFL). [Paper]
- Lifting-Transformer: "Lifting Transformer for 3D Human Pose Estimation in Video", arXiv, 2021 (Peking). [Paper]
- TFPose: "TFPose: Direct Human Pose Estimation with Transformers", arXiv, 2021 (The University of Adelaide). [Paper][PyTorch]
- Skeletor: "Skeletor: Skeletal Transformers for Robust Body-Pose Estimation", arXiv, 2021 (University of Surrey). [Paper]
- HandsFormer: "HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation of Hands and Object in Interaction", arXiv, 2021 (Graz University of Technology). [Paper]
- TTP: "Test-Time Personalization with a Transformer for Human Pose Estimation", NeurIPS, 2021 (UCSD). [Paper][PyTorch][Website]
- GraFormer: "GraFormer: Graph Convolution Transformer for 3D Pose Estimation", arXiv, 2021 (CAS). [Paper]
- GCT: "Geometry-Contrastive Transformer for Generalized 3D Pose Transfer", AAAI, 2022 (University of Oulu). [Paper][PyTorch]
- MHFormer: "MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation", CVPR, 2022 (Peking). [Paper][PyTorch]
- PAHMT: "Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation", CVPR, 2022 (NetEase). [Paper]
- TCFormer: "Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer", CVPR, 2022 (CUHK). [Paper][PyTorch]
- PETR: "End-to-End Multi-Person Pose Estimation With Transformers", CVPR, 2022 (Hikvision). [Paper][PyTorch]
- GraFormer: "GraFormer: Graph-Oriented Transformer for 3D Pose Estimation", CVPR, 2022 (CAS). [Paper]
- Keypoint-Transformer: "Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation", CVPR, 2022 (Graz University of Technology, Austria). [Paper][PyTorch][Website]
- MPS-Net: "Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video", CVPR, 2022 (Academia Sinica). [Paper][Website]
- Ego-STAN: "Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation", CVPRW, 2022 (University of Waterloo, Canada). [Paper]
- AggPose: "AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation", IJCAI, 2022 (Shenzhen Baoan Women’s and Childiren’s Hospital). [Paper][Code (in construction)]
- MotionMixer: "MotionMixer: MLP-based 3D Human Body Pose Forecasting", IJCAI, 2022 (Ulm University, Germany). [Paper][Code (in construction)]
- Jointformer: "Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation", ICPR, 2022 (Trinity College Dublin, Ireland). [Paper]
- IVT: "IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation", ACMMM, 2022 (Baidu). [Paper]
- FastMETRO: "Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers", ECCV, 2022 (POSTECH). [Paper][PyTorch][Website]
- PPT: "PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation", ECCV, 2022 (UC Irvine). [Paper][PyTorch]
- Poseur: "Poseur: Direct Human Pose Regression with Transformers", ECCV, 2022 (The University of Adelaide, Australia). [Paper]
- ViTPose: "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation", NeurIPS, 2022 (The University of Sydney). [Paper][PyTorch]
- Swin-Pose: "Swin-Pose: Swin Transformer Based Human Pose Estimation", arXiv, 2022 (UMass Lowell) [Paper]
- HeadPosr: "HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders", arXiv, 2022 (ETHZ). [Paper]
- CrossFormer: "CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation", arXiv, 2022 (Canberra University, Australia). [Paper]
- VTP: "VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation", arXiv, 2022 (Hangzhou Dianzi University). [Paper]
- HeatER: "HeatER: An Efficient and Unified Network for Human Reconstruction via Heatmap-based TransformER", arXiv, 2022 (UCF). [Paper]
- GraphMLP: "GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation", arXiv, 2022 (Peking University). [Paper]
- siMLPe: "Back to MLP: A Simple Baseline for Human Motion Prediction", arXiv, 2022 (INRIA). [Paper][Pytorch]
- Snipper: "Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet", arXiv, 2022 (University of Alberta, Canada). [Paper][PyTorch]
- OTPose: "OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos", arXiv, 2022 (Korea University). [Paper]
- PoseBERT: "PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling", arXiv, 2022 (NAVER). [Paper][PyTorch]
- KOG-Transformer: "K-Order Graph-oriented Transformer with GraAttention for 3D Pose and Shape Estimation", arXiv, 2022 (CAS). [Paper]
- SoMoFormer: "SoMoFormer: Multi-Person Pose Forecasting with Transformers", arXiv, 2022 (Stanford). [Paper]
- DPIT: "DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation", arXiv, 2022 (Shanghai University). [Paper]
- Uplift-Upsample: "Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers", WACV, 2023 (University of Augsburg, Germany). [Paper][Tensorflow]
- TORE: "TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer", arXiv, 2022 (HKU). [Paper]
- MPT: "MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction", arXiv, 2022 (Microsoft). [Paper]
- ViTPose+: "ViTPose+: Vision Transformer Foundation Model for Generic Body Pose Estimation", arXiv, 2022 (The University of Sydney). [Paper][PyTorch]
Hands:
- Hand-Transformer: "Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation", ECCV, 2020 (Kwai). [Paper]
- SCAT: "SCAT: Stride Consistency With Auto-Regressive Regressor and Transformer for Hand Pose Estimation", ICCVW, 2021 (Alibaba). [Paper]
- SeTHPose: "Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation", arXiv, 2022 (Queen's University, Canada). [Paper]
- HTT: "Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos", arXiv, 2022 (HKU). [Paper]
- ?: "Image-free Domain Generalization via CLIP for 3D Hand Pose Estimation", arXiv, 2022 (UNIST, Korea). [Paper]
Others:
- TAPE: "Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry", arXiv, 2020 (Tianjing University). [Paper]
- T6D-Direct: "T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression", GCPR, 2021 (University of Bonn). [Paper]
- 6D-ViT: "6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning", arXiv, 2021 (University of Science and Technology of China). [Paper]
- RayTran: "RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers", ECCV, 2022 (Google). [Paper]
- DProST: "DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation", ECCV, 2022 (Seoul National University). [Paper][PyTorch]
- AFT-VO: "AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation", arXiv, 2022 (University of Surrey, UK). [Paper]
- DPT-VO: "Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry", arXiv, 2022 (Aeronautics Institute of Technology, Brazil). [Paper]
- ?: "Video based Object 6D Pose Estimation using Transformers", arXiv, 2022 (Georgia Tech). [Paper][PyTorch]
- PoET: "PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation", arXiv, 2022 (Infineon Technologies Austria AG). [Paper][PyTorch]
- CRT-6D: "CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers", WACV, 2023 (ICL, UK**). [Paper][Code (in construction)]

[Back to Overview]

Tracking

General:
- TransTrack: "TransTrack: Multiple-Object Tracking with Transformer",arXiv, 2020 (HKU + ByteDance) . [Paper][PyTorch]
- TransformerTrack: "Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking", CVPR, 2021 (USTC). [Paper][PyTorch]
- TransT: "Transformer Tracking", CVPR, 2021 (Dalian University of Technology). [Paper][PyTorch]
- STARK: "Learning Spatio-Temporal Transformer for Visual Tracking", ICCV, 2021 (Microsoft). [Paper][PyTorch]
- HiFT: "HiFT: Hierarchical Feature Transformer for Aerial Tracking", ICCV, 2021 (Tongji University). [Paper][PyTorch]
- DTT: "High-Performance Discriminative Tracking With Transformers", ICCV, 2021 (CAS). [Paper]
- DualTFR: "Learning Tracking Representations via Dual-Branch Fully Transformer Networks", ICCVW, 2021 (Microsoft). [Paper][PyTorch (in construction)]
- TransCenter: "TransCenter: Transformers with Dense Queries for Multiple-Object Tracking", arXiv, 2021 (INRIA + MIT). [Paper]
- TransMOT: "TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking", arXiv, 2021 (Microsoft). [Paper]
- TREG: "Target Transformed Regression for Accurate Tracking", arXiv, 2021 (Nanjing University). [Paper][Code (in construction)]
- TrTr: "TrTr: Visual Tracking with Transformer", arXiv, 2021 (University of Tokyo). [Paper][PyTorch]
- RelationTrack: "RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation", arXiv, 2021 (Huazhong Univerisity of Science and Technology). [Paper]
- SiamTPN: "Siamese Transformer Pyramid Networks for Real-Time UAV Tracking", WACV, 2022 (New York University). [Paper]
- MixFormer: "MixFormer: End-to-End Tracking with Iterative Mixed Attention", CVPR, 2022 (Nanjing University). [Paper][PyTorch]
- ToMP: "Transforming Model Prediction for Tracking", CVPR, 2022 (ETHZ). [Paper][PyTorch]
- GTR: "Global Tracking Transformers", CVPR, 2022 (UT Austin). [Paper][PyTorch]
- UTT: "Unified Transformer Tracker for Object Tracking", CVPR, 2022 (Meta). [Paper][Code (in construction)]
- MeMOT: "MeMOT: Multi-Object Tracking with Memory", CVPR, 2022 (Amazon). [Paper]
- CSwinTT: "Transformer Tracking with Cyclic Shifting Window Attention", CVPR, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
- STNet: "Spiking Transformers for Event-Based Single Object Tracking", CVPR, 2022 (Dalian University of Technology). [Paper]
- TrackFormer: "TrackFormer: Multi-Object Tracking with Transformers", CVPR, 2022 (Facebook). [Paper][PyTorch]
- SparseTT: "SparseTT: Visual Tracking with Sparse Transformers", IJCAI, 2022 (Beihang University). [Paper][Code (in construction)]
- AiATrack: "AiATrack: Attention in Attention for Transformer Visual Tracking", ECCV, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
- MOTR: "MOTR: End-to-End Multiple-Object Tracking with TRansformer", ECCV, 2022 (Megvii). [Paper][PyTorch]
- SwinTrack: "SwinTrack: A Simple and Strong Baseline for Transformer Tracking", NeurIPS, 2022 (South China University of Technology). [Paper][PyTorch]
- ModaMixer: "Divert More Attention to Vision-Language Tracking", NeurIPS, 2022 (Beijing Jiaotong University). [Paper][PyTorch]
- TransMOT: "Transformers for Multi-Object Tracking on Point Clouds", IV, 2022 (Bosch). [Paper]
- TransT-M: "High-Performance Transformer Tracking", arXiv, 2022 (Dalian University of Technology). [Paper]
- HCAT: "Efficient Visual Tracking via Hierarchical Cross-Attention Transformer", arXiv, 2022 (Dalian University of Technology). [Paper]
- ?: "Keypoints Tracking via Transformer Networks", arXiv, 2022 (KAIST). [Paper][PyTorch]
- TranSTAM: "Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking", arXiv, 2022 (Tsinghua University). [Paper][PyTorch]
- TransFiner: "TransFiner: A Full-Scale Refinement Approach for Multiple Object Tracking", arXiv, 2022 (China University of Geosciences). [Paper]
- LPAT: "Local Perception-Aware Transformer for Aerial Tracking", arXiv, 2022 (Tongji University). [Paper][PyTorch]
- TADN: "Transformer-based assignment decision network for multiple object tracking", arXiv, 2022 (National Technical University of Athens, Greece). [Paper][Code (in construction)]
- Strong-TransCenter: "Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations", arXiv, 2022 (Tel-Aviv University). [Paper][PyTorch]
- MQT: "End-to-end Tracking with a Multi-query Transformer", arXiv, 2022 (Oxford). [Paper]
- ProContEXT: "ProContEXT: Exploring Progressive Context Transformer for Tracking", arXiv, 2022 (Alibaba). [Paper]
- ?: "Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer", arXiv, 2022 (Sony). [Paper]
- MOTRv2: "MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors", arXiv, 2022 (Megvii). [Paper][Pytorch]
3D:
- STNet: "3D Siamese Transformer Network for Single Object Tracking on Point Clouds", ECCV, 2022 (Nanjing University of Science and Technology). [Paper][PyTorch]
- CMT: "CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds", ECCV, 2022 (USTC). [Paper]
- InterTrack: "InterTrack: Interaction Transformer for 3D Multi-Object Tracking", arXiv, 2022 (University of Toronto). [Paper]
- GLT-T: "GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds", AAAI, 2023 (Hangzhou Dianzi University). [Paper]

[Back to Overview]

Re-ID

PAT: "Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer", CVPR, 2021 (University of Science and Technology of China). [Paper]
HAT: "HAT: Hierarchical Aggregation Transformers for Person Re-identification", ACMMM, 2021 (Dalian University of Technology). [Paper]
TransReID: "TransReID: Transformer-based Object Re-Identification", ICCV, 2021 (Alibaba). [Paper][PyTorch]
APD: "Transformer Meets Part Model: Adaptive Part Division for Person Re-Identification", ICCVW, 2021 (Meituan). [Paper]
Pirt: "Pose-guided Inter- and Intra-part Relational Transformer for Occluded Person Re-Identification", ACMMM, 2021 (Beihang University). [Paper]
TransMatcher: "Transformer-Based Deep Image Matching for Generalizable Person Re-identification", NeurIPS, 2021 (IIAI). [Paper][PyTorch]
STT: "Spatiotemporal Transformer for Video-based Person Re-identification", arXiv, 2021 (Beihang University). [Paper]
AAformer: "AAformer: Auto-Aligned Transformer for Person Re-Identification", arXiv, 2021 (CAS). [Paper]
TMT: "A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification", arXiv, 2021 (Dalian University of Technology). [Paper]
LA-Transformer: "Person Re-Identification with a Locally Aware Transformer", arXiv, 2021 (University of Maryland Baltimore County). [Paper]
DRL-Net: "Learning Disentangled Representation Implicitly via Transformer for Occluded Person Re-Identification", arXiv, 2021 (Peking University). [Paper]
GiT: "GiT: Graph Interactive Transformer for Vehicle Re-identification", arXiv, 2021 (Huaqiao University). [Paper]
OH-Former: "OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification", arXiv, 2021 (Shanghaitech University). [Paper]
CMTR: "CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification", arXiv, 2021 (Beijing Jiaotong University). [Paper]
PFD: "Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer", AAAI, 2022 (Peking). [Paper][PyTorch]
NFormer: "NFormer: Robust Person Re-identification with Neighbor Transformer", CVPR, 2022 (University of Amsterdam, Netherlands). [Paper][Code (in construction)]
DCAL: "Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification", CVPR, 2022 (Advanced Micro Devices, China). [Paper]
CMT: " Cross-Modality Transformer for Visible-Infrared Person Re-identification", ECCV, 2022 (USTC). [Paper]
CAViT: "CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification", ECCV, 2022 (CAS). [Paper][PyTorch]
PiT: "Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval", IEEE Transactions on Industrial Informatics, 2022 (* Peking*). [Paper]
?: "Motion-Aware Transformer For Occluded Person Re-identification", arXiv, 2022 (NetEase, China). [Paper]
PFT: "Short Range Correlation Transformer for Occluded Person Re-Identification", arXiv, 2022 (Nanjing University of Posts and Telecommunications). [Paper]
?: "CLIP-Driven Fine-grained Text-Image Person Re-identification", arXiv, 2022 (Nanjing University of Science and Technology). [Paper]
SeqTR: "Sequential Transformer for End-to-End Person Search", arXiv, 2022 (East China Normal University). [Paper]
CLIP-ReID: "CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels", arXiv, 2022 (East China Normal University). [Paper]
TMGF: "Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification", WACVW, 2023 (Zhejiang University). [Paper][Code (in construction)]
PMT: "Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification", AAAI, 2023 (Jiangsu University). [Paper][Code (in construction)]

[Back to Overview]

Face

General:
- FAU-Transformer: "Facial Action Unit Detection With Transformers", CVPR, 2021 (Rakuten Institute of Technology). [Paper]
- TADeT: "Mitigating Bias in Visual Transformers via Targeted Alignment", BMVC, 2021 (Gerogia Tech). [Paper]
- ViT-Face: "Face Transformer for Recognition", arXiv, 2021 (Beijing University of Posts and Telecommunications). [Paper]
- FaceT: "Learning to Cluster Faces via Transformer", arXiv, 2021 (Alibaba). [Paper]
- VidFace: "VidFace: A Full-Transformer Solver for Video Face Hallucination with Unaligned Tiny Snapshots", arXiv, 2021 (Zhejiang University). [Paper]
- FAA: "Shuffle Transformer with Feature Alignment for Video Face Parsing", arXiv, 2021 (Tencent). [Paper]
- FaRL: "General Facial Representation Learning in a Visual-Linguistic Manner", CVPR, 2022 (Microsoft). [Paper][PyTorch]
- FaceFormer: "FaceFormer: Speech-Driven 3D Facial Animation with Transformers", CVPR, 2022 (HKU). [Paper][PyTorch][Website]
- PhysFormer: "PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer", CVPR, 2022 (University of Oulu, Finland). [Paper][PyTorch]
- VTP: "Sub-word Level Lip Reading With Visual Attention", CVPR, 2022 (Oxford). [Paper]
- Label2Label: "Label2Label: A Language Modeling Framework for Multi-Attribute Learning", ECCV, 2022 (Tsinghua). [Paper][PyTorch]
- FPVT: "Face Pyramid Vision Transformer", BMVC, 2022 (FloppyDisk.AI, Pakistan). [Paper][PyTorch][Website]
- fViT: "Part-based Face Recognition with Vision Transformers", BMVC, 2022 (Queen Mary University of London). [Paper]
- EventFormer: "EventFormer: AU Event Transformer for Facial Action Unit Event Detection", arXiv, 2022 (Peking). [Paper]
- MFT: "Multi-Modal Learning for AU Detection Based on Multi-Head Fused Transformers", arXiv, 2022 (SUNY Binghamton). [Paper]
- VC-TRSF: "Self-supervised Video-centralised Transformer for Video Face Clustering", arXiv, 2022 (ICL). [Paper]
Facial Landmark:
- Clusformer: "Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition", CVPR, 2021 (VinAI Research, Vietnam). [Paper]
- LOTR: "LOTR: Face Landmark Localization Using Localization Transformer", arXiv, 2021 (Sertis, Thailand). [Paper]
- SLPT: "Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning", CVPR, 2022 (University of Technology Sydney). [Paper][PyTorch]
- DTLD: "Towards Accurate Facial Landmark Detection via Cascaded Transformers", CVPR, 2022 (Samsung). [Paper]
- RePFormer: "RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark Detection", arXiv, 2022 (CUHK). [Paper]
Face Low-Level Vision:
- Latent-Transformer: "A Latent Transformer for Disentangled Face Editing in Images and Videos", ICCV, 2021 (Institut Polytechnique de Paris). [Paper][PyTorch]
- TANet: "TANet: A new Paradigm for Global Face Super-resolution via Transformer-CNN Aggregation Network", arXiv, 2021 (Wuhan Institute of Technology). [Paper]
- FAT: "Facial Attribute Transformers for Precise and Robust Makeup Transfer", WACV, 2022 (University of Rochester). [Paper]
- SSAT: "SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal", AAAI, 2022 (Wuhan University). [Paper][PyTorch]
- TransEditor: "TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing", CVPR, 2022 (Shanghai AI Lab). [Paper][PyTorch][Website]
- RestoreFormer: "RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs", CVPR, 2022 (HKU). [Paper]
- HairCLIP: "HairCLIP: Design Your Hair by Text and Reference Image", CVPR, 2022 (USTC). [Paper][PyTorch]
- AnyFace: "AnyFace: Free-style Text-to-Face Synthesis and Manipulation", CVPR, 2022 (CAS). [Paper]
- CodeFormer: "Towards Robust Blind Face Restoration with Codebook Lookup Transformer", NeurIPS, 2022 (NTU, Singapore). [Paper][PyTorch (in construction)][Website]
- Cycle-Text2Face: "Cycle Text2Face: Cycle Text-to-face GAN via Transformers", arXiv, 2022 (Shahed Univerisity, Iran). [Paper]
- FaceFormer: "FaceFormer: Scale-aware Blind Face Restoration with Transformers", arXiv, 2022 (Tencent). [Paper]
- text2StyleGAN: "Text-Free Learning of a Natural Language Interface for Pretrained Face Generators", arXiv, 2022 (Toyota Technological Institute, Chicago). [Paper][PyTorch]
- ManiCLIP: "ManiCLIP: Multi-Attribute Face Manipulation from Text", arXiv, 2022 (NTU, Singapore). [Paper]
- FEAT: "FEAT: Face Editing with Attention", arXiv, 2022 (Shenzhen University). [Paper]
Facial Expression:
- TransFER: "TransFER: Learning Relation-aware Facial Expression Representations with Transformers", ICCV, 2021 (CAS). [Paper]
- CVT-Face: "Robust Facial Expression Recognition with Convolutional Visual Transformers", arXiv, 2021 (Hunan University). [Paper]
- MViT: "MViT: Mask Vision Transformer for Facial Expression Recognition in the wild", arXiv, 2021 (University of Science and Technology of China). [Paper]
- ViT-SE: "Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition", arXiv, 2021 (CentraleSupélec, France). [Paper]
- EST: "Expression Snippet Transformer for Robust Video-based Facial Expression Recognition", arXiv, 2021 (China University of Geosciences). [Paper][PyTorch]
- MFEViT: "MFEViT: A Robust Lightweight Transformer-based Network for Multimodal 2D+3D Facial Expression Recognition", arXiv, 2021 (University of Science and Technology of China). [Paper]
- F-PDLS: "Vision Transformer Equipped with Neural Resizer on Facial Expression Recognition Task", ICASSP, 2022 (KAIST). [Paper]
- ?: "Transformer-based Multimodal Information Fusion for Facial Expression Analysis", arXiv, 2022 (Netease, China). [Paper]
- ?: "Facial Expression Recognition with Swin Transformer", arXiv, 2022 (Dongguk University, Korea). [Paper]
- POSTER: "POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition", arXiv, 2022 (UCF). [Paper]
- STT: "Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild", arXiv, 2022 (*Hunan University *). [Paper]
- FaceMAE: "FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders", arXiv, 2022 (NUS). [Paper][Code (in construction)]
- TransFA: "TransFA: Transformer-based Representation for Face Attribute Evaluation", arXiv, 2022 (Xidian University). [Paper]
- AU-CVT: "AU-Supervised Convolutional Vision Transformers for Synthetic Facial Expression Recognition", arXiv, 2022 (Shenzhen Technology University). [Paper][PyTorch]
- ?: "Multi-Task Transformer with uncertainty modelling for Face Based Affective Computing", arXiv, 2022 (Datakalab, France). [Paper]
- APViT: "Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition", arXiv, 2022 (Baidu). [Paper]
Attack-related:
- ?: "Video Transformer for Deepfake Detection with Incremental Learning", ACMMM, 2021 (MBZUAI). [Paper]
- ViTranZFAS: "On the Effectiveness of Vision Transformers for Zero-shot Face Anti-Spoofing", International Joint Conference on Biometrics (IJCB), 2021 (Idiap). [Paper]
- MTSS: "Multi-Teacher Single-Student Visual Transformer with Multi-Level Attention for Face Spoofing Detection", BMVC, 2021 (National Taiwan Ocean University). [Paper]
- TransRPPG: "TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection", arXiv, 2021 (University of Oulu). [Paper]
- CViT: "Deepfake Video Detection Using Convolutional Vision Transformer", arXiv, 2021 (Jimma University). [Paper]
- ViT-Distill: "Deepfake Detection Scheme Based on Vision Transformer and Distillation", arXiv, 2021 (Sookmyung Women’s University). [Paper]
- M2TR: "M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection", arXiv, 2021 (Fudan University). [Paper]
- Cross-ViT: "Combining EfficientNet and Vision Transformers for Video Deepfake Detection", arXiv, 2021 (University of Pisa). [Paper][PyTorch]
- ICT: "Protecting Celebrities from DeepFake with Identity Consistency Transformer", CVPR, 2022 (Microsoft). [Paper][PyTorch]
- GGViT: "GGViT: Multistream Vision Transformer Network in Face2Face Facial Reenactment Detection", ICPR, 2022 (CAS). [Paper]
- ?: "Hybrid Transformer Network for Deepfake Detection", International Conference on Content-Based Multimedia Indexing (CBMI), 2022 (MediaFutures, Norway). [Paper]
- ViTAF: "Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing", ECCV, 2022 (Google). [Paper]
- UIA-ViT: "UIA-ViT: Unsupervised Inconsistency-Aware Method Based on Vision Transformer for Face Forgery Detection", ECCV, 2022 (USTC). [Paper]
- ?: "Multi-Scale Wavelet Transformer for Face Forgery Detection", ACCV, 2022 (Hikvision). [Paper]
- ?: "Self-supervised Transformer for Deepfake Detection", arXiv, 2022 (USTC, China). [Paper]
- ViTransPAD: "ViTransPAD: Video Transformer using convolution and self-attention for Face Presentation Attack Detection", arXiv, 2022 (University of La Rochelle, France). [Paper]
- ?: "Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection", arXiv, 2022 (National Research Council, Italy). [Paper]
- STDT: "Deepfake Video Detection with Spatiotemporal Dropout Transformer", arXiv, 2022 (CAS). [Paper]
- ?: "Deep Convolutional Pooling Transformer for Deepfake Detection", arXiv, 2022 (HKU). [Paper]

[Back to Overview]

Neural Architecture Search

HR-NAS: "HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers", CVPR, 2021 (HKU). [Paper][PyTorch]
CATE: "CATE: Computation-aware Neural Architecture Encoding with Transformers", ICML, 2021 (Michigan State). [Paper]
AutoFormer: "AutoFormer: Searching Transformers for Visual Recognition", ICCV, 2021 (Microsoft). [Paper][PyTorch]
GLiT: "GLiT: Neural Architecture Search for Global and Local Image Transformer", ICCV, 2021 (The University of Sydney + SenseTime). [Paper]
BossNAS: "BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search", ICCV, 2021 (Monash University). [Paper][PyTorch]
ViT-ResNAS: "Searching for Efficient Multi-Stage Vision Transformers", ICCVW, 2021 (MIT). [Paper][PyTorch]
AutoformerV2: "Searching the Search Space of Vision Transformer", NeurIPS, 2021 (Microsoft). [Paper][PyTorch]
TNASP: "TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework", NeurIPS, 2021 (CAS + Kuaishou). [Paper]
PSViT: "PSViT: Better Vision Transformer via Token Pooling and Attention Sharing", arXiv, 2021 (The University of Sydney + SenseTime). [Paper]
As-ViT: "Auto-scaling Vision Transformers without Training", ICLR, 2022 (UT Austin). [Paper][PyTorch]
NASViT: "NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training", ICLR, 2022 (Facebook). [Paper]
TF-TAS: "Training-free Transformer Architecture Search", CVPR, 2022 (Tencent). [Paper]
ViT-Slim: "Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space", CVPR, 2022 (MBZUAI). [Paper][PyTorch]
BurgerFormer: "Searching for BurgerFormer with Micro-Meso-Macro Space Design", ICML, 2022 (CAS). [Paper][Code (in construction)]
UniNet: "UniNet: Unified Architecture Search with Convolution, Transformer, and MLP", ECCV, 2022 (CUHK + SenseTime). [Paper]
ViTAS: "Vision Transformer Architecture Search", ECCV, 2022 (The University of Sydney + SenseTime). [Paper]
VTCAS: "Vision Transformer with Convolutions Architecture Search", arXiv, 2022 (Donghua University). [Paper]
NOAH: "Neural Prompt Search", arXiv, 2022 (NTU, Singapore). [Paper][PyTorch]
FocusFormer: "FocusFormer: Focusing on What We Need via Architecture Sampler", arXiv, 2022 (Monash University, Australia). [Paper]
NAR-Former: "NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction", arXiv, 2022 (Xidian University, China). [Paper]

[Back to Overview]

Scene Graph

BGT-Net: "BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation", CVPRW, 2021 (ETHZ). [Paper]
STTran: "Spatial-Temporal Transformer for Dynamic Scene Graph Generation", ICCV, 2021 (Leibniz University Hannover, Germany). [Paper][PyTorch]
SGG-NLS: "Learning to Generate Scene Graph from Natural Language Supervision", ICCV, 2021 (University of Wisconsin-Madison). [Paper][PyTorch]
SGG-Seq2Seq: "Context-Aware Scene Graph Generation With Seq2Seq Transformers", ICCV, 2021 (Layer 6 AI, Canada). [Paper][PyTorch]
RELAX: "Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs", BMVC, 2021 (Samsung). [Paper]
Relation-Transformer: "Scenes and Surroundings: Scene Graph Generation using Relation Transformer", arXiv, 2021 (LMU Munich). [Paper]
SGTR: "SGTR: End-to-end Scene Graph Generation with Transformer", CVPR, 2022 (ShanghaiTech). [Paper][Code (in construction)]
GCL: "Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation", CVPR, 2022 (Shandong University). [Paper][PyTorch]
Relationformer: "Relationformer: A Unified Framework for Image-to-Graph Generation", ECCV, 2022 (TUM). [Paper][Code (in construction)]
SVRP: "Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning", ECCV, 2022 (Monash University). [Paper]
RelTR: "RelTR: Relation Transformer for Scene Graph Generation", arXiv, 2022 (Leibniz University Hannover, Germany). [Paper][PyTorch]
SG-Shuffle: "SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation", arXiv, 2022 (The University of Sydney). [Paper]
IS-GGT: "Iterative Scene Graph Generation with Generative Transformers", arXiv, 2022 (Oklahoma State University). [Paper]

[Back to Overview]

Transfer / X-Supervised / X-Shot / Continual Learning

Transfer Learning:
- AdaptFormer: "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition", NeurIPS, 2022 (HKU). [Paper][PyTorch][Website]
- Convpass: "Convolutional Bypasses Are Better Vision Transformer Adapters", arXiv, 2022 (Peking University). [Paper][Pytorch]
- FacT: "FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer", AAAI, 2023 (Peking). [Paper][Pytorch]
Domain Adaptation/Generalization:
- TransDA: "Transformer-Based Source-Free Domain Adaptation", arXiv, 2021 (Haerbin Institute of Technology). [Paper][PyTorch]
- TVT: "TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation", arXiv, 2021 (UT Arlington + Kuaishou). [Paper]
- ResTran: "Discovering Spatial Relationships by Transformers for Domain Generalization", arXiv, 2021 (MBZUAI). [Paper]
- WinTR: "Exploiting Both Domain-specific and Invariant Knowledge via a Win-win Transformer for Unsupervised Domain Adaptation", arXiv, 2021 (Beijing Institute of Technology). [Paper]
- CDTrans: "CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation", ICLR, 2022 (Alibaba). [Paper][PyTorch]
- SSRT: "Safe Self-Refinement for Transformer-based Domain Adaptation", CVPR, 2022 (Stony Brook). [Paper]
- DOT: "Making the Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation", ACMMM, 2022 (Beijing Institute of Technology). [Paper]
- GVRT: "Grounding Visual Representations with Texts for Domain Generalization", ECCV, 2022 (LG). [Paper][PyTorch]
- PACMAC: "Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency", NeurIPS, 2022 (Georgia Tech). [Paper][PyTorch]
- BCAT: "Domain Adaptation via Bidirectional Cross-Attention Transformer", arXiv, 2022 (Southern University of Science and Technology). [Paper]
- DoTNet: "Towards Unsupervised Domain Adaptation via Domain-Transformer", arXiv, 2022 (Sun Yat-Sen University). [Paper]
- TransDA: "Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation", arXiv, 2022 (Tsinghua). [Paper][Code (in construction)]
- FAMLP: "FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization", arXiv, 2022 (University of Science and Technology of China). [Paper]
- ERM-ViT: "Self-Distilled Vision Transformer for Domain Generalization", arXiv, 2022 (MBZUAI). [Paper][PyTorch]
- MPA: "Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation", arXiv, 2022 (Fudan University). [Paper]
- DePT: "Visual Prompt Tuning for Test-time Domain Adaptation", arXiv, 2022 (Amazon). [Paper]
- LADS: "Using Language to Extend to Unseen Domains", arXiv, 2022 (Berkeley). [Paper]
- FedAPT: "Cross-domain Federated Adaptive Prompt Tuning for CLIP", arXiv, 2022 (Fudan University). [Paper]
- MetaPrompt: "Learning Domain Invariant Prompt for Vision-Language Models", arXiv, 2022 (Tongji University + Microsoft). [Paper]
X-Supervised:
- Semiformer: "Semi-Supervised Vision Transformers", ECCV, 2022 (Fudan University). [Paper][PyTorch]
- SVL-Adapter: "SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models", BMVC, 2022 (UCL). [Paper][Code (in construction)]
- Semi-ViT: "Semi-supervised Vision Transformers at Scale", NeurIPS, 2022 (Amazon). [Paper]
Zero-Shot:
- ViT-ZSL: "Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning", IMVIP, 2021 (University of Exeter, UK). [Paper]
- TransZero: "TransZero: Attribute-guided Transformer for Zero-Shot Learning", AAAI, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
- ?: "Zero-shot Visual Commonsense Immorality Prediction", BMVC, 2022 (Korea University). [Paper][PyTorch]
- TPT: "Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models", NeurIPS, 2022 (NVIDIA). [Paper][PyTorch][Website]
- I2DFormer: "I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification", NeurIPS, 2022 (ETHZ). [Paper]
- HRT: "Hybrid Routing Transformer for Zero-Shot Learning", arXiv, 2022 (Xidian University). [Paper]
- MUST: "Masked Unsupervised Self-training for Zero-shot Image Classification", arXiv, 2022 (Salesforce). [Paper]
- CuPL: "What does a platypus look like? Generating customized prompts for zero-shot image classification", arXiv, 2022 (University of Washington). [Paper][PyTorch]
- VL-Taboo: "VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models", arXiv, 2022 (Goethe University Frankfurt, Germany). [Paper][Code (in construction)]
- CALIP: "CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention", arXiv, 2022 (Peking University). [Paper]
- PromptCompVL: "Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning", arXiv, 2022 (Michigan State). [Paper]
- SuS-X: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models", arXiv, 2022 (Cambridge). [Paper][PyTorch]
- I2MVFormer: "I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification", arXiv, 2022 (ETHZ). [Paper]
X-Shot:
- CrossTransformer: "CrossTransformers: spatially-aware few-shot transfer", NeurIPS, 2020 (DeepMind). [Paper][Tensorflow]
- URT: "A Universal Representation Transformer Layer for Few-Shot Image Classification", ICLR, 2021 (Mila). [Paper][PyTorch]
- TRX: "Temporal-Relational CrossTransformers for Few-Shot Action Recognition", CVPR, 2021 (University of Bristol). [Paper][PyTorch]
- Few-shot-Transformer: "Few-Shot Transformation of Common Actions into Time and Space", arXiv, 2021 (University of Amsterdam). [Paper]
- HCTransformers: "Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning", CVPR, 2022 (Fudan University). [Paper][PyTorch]
- HyperTransformer: "HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning", CVPR, 2022 (Google). [Paper][PyTorch][Website]
- STRM: "Spatio-temporal Relation Modeling for Few-shot Action Recognition", CVPR, 2022 (MBZUAI). [Paper][PyTorch][Website]
- HyperTransformer: "HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning", ICML, 2022 (Google). [Paper]
- CPM: "Compound Prototype Matching for Few-shot Action Recognition", ECCV, 2022 (The University of Tokyo). [Paper]
- SUN: "Self-Promoted Supervision for Few-Shot Transformer", ECCV, 2022 (Harbin Institute of Technology + NUS). [Paper][PyTorch]
- Tip-Adapter: "Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification", ECCV, 2022 (Shanghai AI Lab). [Paper][PyTorch]
- tSF: "tSF: Transformer-Based Semantic Filter for Few-Shot Learning", ECCV, 2022 (Tencent). [Paper]
- TransVLAD: "TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning", ECCV, 2022 (Southern University of Science and Technology, China). [Paper]
- BaseTransformers: "BaseTransformers: Attention over base data-points for One Shot Learning", BMVC, 2022 (Dublin City University, Ireland). [Paper][PyTorch]
- FPTrans: "Feature-Proxy Transformer for Few-Shot Segmentation", NeurIPS, 2022 (Baidu). [Paper][Code (in construction)]
- MM-Former: "Mask Matching Transformer for Few-Shot Segmentation", NeurIPS, 2022 (Picsart). [Paper][PyTorch]
- MG-ViT: "Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning", arXiv, 2022 (University of Electronic Science and Technology of China). [Paper]
- QSFormer: "Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification", arXiv, 2022 (Anhui University). [Paper]
- FS-CT: "Enhancing Few-shot Image Classification with Cosine Transformer", arXiv, 2022 (VinUniversity, Vietnam). [Paper][PyTorch]
- CoCa-CNI: "Exploiting Category Names for Few-Shot Classification with Vision-Language Models", arXiv, 2022 (Google). [Paper]
Continual Learning:
- MEAT: "Meta-attention for ViT-backed Continual Learning", CVPR, 2022 (Zhejiang University). [Paper][Code (in construction)]
- DyTox: "DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion", CVPR, 2022 (Sorbonne Universite, France). [Paper][PyTorch]
- LVT: "Continual Learning With Lifelong Vision Transformer", CVPR, 2022 (The University of Sydney). [Paper]
- L2P: "Learning to Prompt for Continual Learning", CVPR, 2022 (Google). [Paper][Tensorflow]
- ?: "Simpler is Better: off-the-shelf Continual Learning Through Pretrained Backbones", CVPRW, 2022 (Ca' Foscari University, Italy). [Paper][PyTorch]
- ADA: "Continual Learning with Transformers for Image Classification", CVPRW, 2022 (Amazon). [Paper]
- ?: "Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization", CVPRW, 2022 (Ca' Foscari University, Italy). [Paper]
- DualPrompt: "DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning", ECCV, 2022 (Google). [Paper][Tensorflow]
- CVT: "Online Continual Learning with Contrastive Vision Transformer", ECCV, 2022 (The University of Sydney). [Paper]
- IncCLIP: "Generative Negative Text Replay for Continual Vision-Language Pretraining", ECCV, 2022 (ShanghaiTech). [Paper]
- S-Prompts: "S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning", NeurIPS, 2022 (Singapore Management University). [Paper]
- ADA: "Memory Efficient Continual Learning with Transformers", NeurIPS, 2022 (Amazon). [Paper]
- BMU-MoCo: "BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling", NeurIPS, 2022 (Renmin University of China). [Paper]
- CLiMB: "CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks", NeurIPS (Datasets and Benchmarks), 2022 (USC). [Paper][PyTorch]
- COLT: "Transformers Are Better Continual Learners", arXiv, 2022 (Hikvision). [Paper]
- D³Former: "D³Former: Debiased Dual Distilled Transformer for Incremental Learning", arXiv, 2022 (MBZUAI). [Paper][PyTorch]
- Continual-CLIP: "CLIP model is an Efficient Continual Learner", arXiv, 2022 (MBZUAI). [Paper][Code (in construction)]
- GCAB-CFDC: "Gated Class-Attention with Cascaded Feature Drift Compensation for Exemplar-free Continual Learning of Vision Transformers", arXiv, 2022 (University of Pavia, Italy). [Paper][Code (in construction)]
- CODA-Prompt: "CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning", arXiv, 2022 (IBM). [Paper]
- PIVOT: "PIVOT: Prompting for Video Continual Learning", arXiv, 2022 (KAUST). [Paper]
Long-tail/Imbalanced:
- BatchFormer: "BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning", CVPR, 2022 (The University of Sydney). [Paper][PyTorch]
- BatchFormerV2: "BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning", arXiv, 2022 (The University of Sydney). [Paper]
- LPT: "LPT: Long-tailed Prompt Tuning for Image Classification", arXiv, 2022 (Harbin Institute of Technology). [Paper]
- LiVT: "Learning Imbalanced Data with Vision Transformers", arXiv, 2022 (Tsinghua). [Paper][PyTorch (in construction)]
Knowledge Distillation:
- ?: "Knowledge Distillation via the Target-aware Transformer", CVPR, 2022 (Alibaba). [Paper]
- DearKD: "DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers", CVPR, 2022 (JD). [Paper]
- AttnDistill: "Attention Distillation: self-supervised vision transformer students need more guidance", BMVC, 2022 (UAB, Spain). [Paper][PyTorch]
- ViTKD: "ViTKD: Practical Guidelines for ViT feature knowledge distillation", arXiv, 2022 (IDEA). [Paper][PyTorch (in construction)]
- ?: "Adaptive Attention Link-based Regularization for Vision Transformers", arXiv, 2022 (* Chung-Ang University, Korea*). [Paper]
Clustering:
- VTCC: "Vision Transformer for Contrastive Clustering", arXiv, 2022 (Sun Yat-sen University, China). [Paper]
Novel Category Discovery:
- PromptCAL: "PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery", arXiv, 2022 (MBZUAI). [Paper][Code (in construction)]

[Back to Overview]

Low-level Vision Tasks

Image Restoration

General:
- NLRN: "Non-Local Recurrent Network for Image Restoration", NeurIPS, 2018 (UIUC). [Paper][Tensorflow]
- RNAN: "Residual Non-local Attention Networks for Image Restoration", ICLR, 2019 (Northeastern University). [Paper][PyTorch]
- PANet: "Pyramid Attention Networks for Image Restoration", arXiv, 2020 (UIUC). [Paper][PyTorch]
- IPT: "Pre-Trained Image Processing Transformer", CVPR, 2021 (Huawei). [Paper][PyTorch (in construction)]
- SwinIR: "SwinIR: Image Restoration Using Swin Transformer", ICCVW, 2021 (ETHZ). [Paper][PyTorch]
- SiamTrans: "SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained Siamese Transformers", AAAI, 2022 (Huawei). [Paper]
- Uformer: "Uformer: A General U-Shaped Transformer for Image Restoration", CVPR, 2022 (University of Science and Technology of China). [Paper][PyTorch]
- MAXIM: "MAXIM: Multi-Axis MLP for Image Processing", CVPR, 2022 (Google). [Paper][Tensorflow]
- Restormer: "Restormer: Efficient Transformer for High-Resolution Image Restoration", CVPR, 2022 (IIAI, UAE). [Paper][PyTorch]
- TransWeather: "TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions", CVPR, 2022 (JHU). [Paper][PyTorch][Website]
- KiT: "KNN Local Attention for Image Restoration", CVPR, 2022 (Yonsei University). [Paper]
- ELMformer: "ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative Transformer", ACMMM, 2022 (Horizon Robotics). [Paper][Code (in construction)]
- EDT: "On Efficient Transformer-Based Image Pre-training for Low-Level Vision", arXiv, 2022 (CUHK). [Paper][PyTorch]
- ?: "Transform your Smartphone into a DSLR Camera: Learning the ISP in the Wild", arXiv, 2022 (ETHZ). [Paper]
- TMT: "Imaging through the Atmosphere using Turbulence Mitigation Transformer", arXiv, 2022 (Purdue). [Paper][Code (in construction)][Website]
- LRT: "LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field Images", arXiv, 2022 (HKU). [Paper]
- ART: "Accurate Image Restoration with Attention Retractable Transformer", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][PyTorch]
Super-Resolution:
- SAN: "Second-Order Attention Network for Single Image Super-Resolution", CVPR, 2019 (Tsinghua). [Paper][PyTorch]
- CS-NL: "Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining", CVPR, 2020 (UIUC). [Paper][PyTorch]
- TTSR: "Learning Texture Transformer Network for Image Super-Resolution", CVPR, 2020 (Microsoft). [Paper][PyTorch]
- HAN: "Single Image Super-Resolution via a Holistic Attention Network", ECCV, 2020 (Northeastern University). [Paper][PyTorch]
- NLSN: "Image Super-Resolution With Non-Local Sparse Attention", CVPR, 2021 (UIUC). [Paper]
- ITSRN: "Implicit Transformer Network for Screen Content Image Continuous Super-Resolution", NeurIPS, 2021 (Tianjin University). [Paper][PyTorch]
- FPAN: "Feedback Pyramid Attention Networks for Single Image Super-Resolution", arXiv, 2021 (Nanjing University of Science and Technology). [Paper]
- ESRT: "Efficient Transformer for Single Image Super-Resolution", arXiv, 2021 (Peking University). [Paper]
- Fusformer: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution", arXiv, 2021 (University of Electronic Science and Technology of China). [Paper]
- DPT: "Detail-Preserving Transformer for Light Field Image Super-Resolution", AAAI, 2022 (Beijing Institute of Technology). [Paper][PyTorch]
- BSRT: "BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment", CVPRW, 2022 (Megvii). [Paper][PyTorch]
- TATT: "A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution", CVPR, 2022 (The Hong Kong Polytechnic University). [Paper][PyTorch]
- LBNet: "Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer", IJCAI, 2022 (Nanjing University of Posts and Telecommunications). [Paper][PyTorch (in construction)]
- DATSR: "Reference-based Image Super-Resolution with Deformable Attention Transformer", ECCV, 2022 (ETHZ). [Paper][Code (in construction)]
- ELAN: "Efficient Long-Range Attention Network for Image Super-resolution", ECCV, 2022 (The Hong Kong Polytechnic University). [Paper][PyTorch]
- Swin2SR: "Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration", ECCVW, 2022 (University of Wurzburg, Germany). [Paper]
- CAT: "Cross Aggregation Transformer for Image Restoration", NeurIPS, 2022 (Shanghai Jiao Tong). [Paper][PyTorch]
- Stoformer: "Stochastic Window Transformer for Image Restoration", NeurIPS, 2022 (USTC). [Paper][PyTorch]
- LFT: "Light Field Image Super-Resolution with Transformers", IEEE Signal Processing Letters, 2022 (National University of Defense Technology, China). [Paper][PyTorch]
- ELAN: "Efficient Long-Range Attention Network for Image Super-resolution", arXiv, 2022 (The Hong Kong Polytechnic University). [Paper][Code (in construction)]
- ACT: "Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution", arXiv, 2022 (LG). [Paper]
- HIPA: "HIPA: Hierarchical Patch Transformer for Single Image Super Resolution", arXiv, 2022 (CUHK). [Paper]
- CTCNet: "CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution", arXiv, 2022 (Nanjing University of Posts and Telecommunications). [Paper]
- HAT: "Activating More Pixels in Image Super-Resolution Transformer", arXiv, 2022 (University of Macau). [Paper][Code (in construction)]
- ShuffleMixer: "ShuffleMixer: An Efficient ConvNet for Image Super-Resolution", arXiv, 2022 (Nanjing University of Science and Technology). [Paper][PyTorch]
- HST: "HST: Hierarchical Swin Transformer for Compressed Image Super-resolution", ECCVW, 2022 (USTC). [Paper]
- SwinFIR: "SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution", arXiv, 2022 (Samsung). [Paper]
- ITSRN++: "ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution", arXiv, 2022 (Tianjin University). [Paper]
- NGswin: "N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution", arXiv, 2022 (Sogang University, Korea). [Paper]
Others:
- SDNet: "SDNet: multi-branch for single image deraining using swin", arXiv, 2021 (Xinjiang University). [Paper][Code (in construction)]
- ATTSF: "Attention! Stay Focus!", arXiv, 2021 (BridgeAI, Seoul). [Paper][Tensorflow]
- HyLoG-ViT: "Hybrid Local-Global Transformer for Image Dehazing", arXiv, 2021 (Beihang University). [Paper]
- HyperTransformer: "HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening", CVPR, 2022 (JHU). [Paper][PyTorch]
- DeHamer: "Image Dehazing Transformer With Transmission-Aware 3D Position Embedding", CVPR, 2022 (Nankai University). [Paper][Website]
- PTNet: "Learning Parallax Transformer Network for Stereo Image JPEG Artifacts Removal", ACMMM, 2022 (Fudan University). [Paper]
- CharFormer: "CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising", ACMMM, 2022 (Jilin University). [Paper][PyTorch (in construction)]
- TurbNet: "Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model", ECCV, 2022 (Purdue + UT Austin). [Paper][PyTorch]
- Stripformer: "Stripformer: Strip Transformer for Fast Image Deblurring", ECCV, 2022 (NTHU). [Paper]
- DehazeFormer: "Vision Transformers for Single Image Dehazing", arXiv, 2022 (Zhejiang University). [Paper][PyTorch]
- RSTCANet: "Residual Swin Transformer Channel Attention Network for Image Demosaicing", arXiv, 2022 (Tampere University, Finland). [Paper]
- DRT: "DRT: A Lightweight Single Image Deraining Recursive Transformer", arXiv, 2022 (ANU, Australia). [Paper][PyTorch (in construction)]
- DenSformer: "Dense residual Transformer for image denoising", arXiv, 2022 (University of Science and Technology Beijing). [Paper]
- Cubic-Mixer: "UHD Image Deblurring via Multi-scale Cubic-Mixer", arXiv, 2022 (Nanjing University of Science and Technology). [Paper]
- PoCoformer: "Polarized Color Image Denoising using Pocoformer", arXiv, 2022 (The University of Tokyo). [Paper]
- MSP-Former: "MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing", arXiv, 2022 (Jimei University). [Paper]
- ELF: "Magic ELF: Image Deraining Meets Association Learning and Transformer", arXiv, 2022 (Wuhan University). [Paper][PyTorch (in construction)]
- DnSwin: "DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer", arXiv, 2022 (Guangdong University of Technology). [Paper]
- SnowFormer: "SnowFormer: Scale-aware Transformer via Context Interaction for Single Image Desnowing", arXiv, 2022 (Jimei University, China). [Paper]
- DMTNet: "DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer", arXiv, 2022 (Samsung). [Paper]
- LMQFormer: "LMQFormer: A Laplace-Prior-Guided Mask Query Transformer for Lightweight Snow Removal", arXiv, 2022 (Fuzhou University). [Paper]
- Semi-UFormer: "Semi-UFormer: Semi-supervised Uncertainty-aware Transformer for Image Dehazing", arXiv, 2022 (Nanjing University of Aeronautics and Astronautics). [Paper]
- WITT: "WITT: A Wireless Image Transmission Transformer for Semantic Communications", arXiv, 2022 (Beijing University of Posts and Telecommunications). [Paper][Code (in construction)]
- BiT: "Blur Interpolation Transformer for Real-World Motion from Blur", arXiv, 2022 (The University of Tokyo). [Paper]
- FFTformer: "Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring", arXiv, 2022 (Nanjing University of Science and Technology). [Paper][Code (in construction)]
- SST: "Spatial-Spectral Transformer for Hyperspectral Image Denoising", arXiv, 2022 (Beijing Institute of Technology). [Paper][PyTorch]

[Back to Overview]

Video Restoration

VSR-Transformer: "Video Super-Resolution Transformer", arXiv, 2021 (ETHZ). [Paper][PyTorch]
MANA: "Memory-Augmented Non-Local Attention for Video Super-Resolution", CVPR, 2022 (JD). [Paper]
?: "Bringing Old Films Back to Life", CVPR, 2022 (Microsoft). [Paper][Code (in construction)]
TTVSR: "Learning Trajectory-Aware Transformer for Video Super-Resolution", CVPR, 2022 (Microsoft). [Paper][PyTorch]
Trans-SVSR: "A New Dataset and Transformer for Stereoscopic Video Super-Resolution", CVPR, 2022 (Bahcesehir University, Turkey). [Paper][PyTorch]
STDAN: "STDAN: Deformable Attention Network for Space-Time Video Super-Resolution", CVPRW, 2022 (Tsinghua). [Paper]
VRT: "VRT: A Video Restoration Transformer", arXiv, 2022 (ETHZ). [Paper][PyTorch]
FGST: "Flow-Guided Sparse Transformer for Video Deblurring", ICML, 2022 (Tsinghua). [Paper][Code (in construction)]
RSTT: "RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution", CVPR, 2022 (Microsoft). [Paper][PyTorch]
FTVSR: "Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution", ECCV, 2022 (Microsoft). [Paper][PyTorch]
EFNet: "Event-Based Fusion for Motion Deblurring with Cross-modal Attention", ECCV, 2022 (ETHZ). [Paper]
TempFormer: "TempFormer: Temporally Consistent Transformer for Video Denoising", ECCV, 2022 (Disney). [Paper]
RVRT: "Recurrent Video Restoration Transformer with Guided Deformable Attention", NeurIPS, 2022 (ETHZ). [Paper][PyTorch]
?: "Rethinking Alignment in Video Super-Resolution Transformers", NeurIPS, 2022 (Shanghai AI Lab). [Paper][PyTorch]
VDTR: "VDTR: Video Deblurring with Transformer", arXiv, 2022 (Tsinghua). [Paper][Code (in construction)]
DSCT: "Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer", arXiv, 2022 (Beijing University of Posts and Telecommunications). [Paper]
Group-ShiftNet: "No Attention is Needed: Grouped Spatial-temporal Shift for Simple and Efficient Video Restorers", arXiv, 2022 (CUHK). [Paper][Code (in construction)][Website]

[Back to Overview]

Inpainting / Completion / Outpainting

Contexual-Attention: "Generative Image Inpainting with Contextual Attention", CVPR, 2018 (UIUC). [Paper][Tensorflow]
PEN-Net: "Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting", CVPR, 2019 (Microsoft). [Paper][PyTorch]
Copy-Paste: "Copy-and-Paste Networks for Deep Video Inpainting", ICCV, 2019 (Yonsei University). [Paper][PyTorch]
Onion-Peel: "Onion-Peel Networks for Deep Video Completion", ICCV, 2019 (Yonsei University). [Paper][PyTorch]
STTN: "Learning Joint Spatial-Temporal Transformations for Video Inpainting", ECCV, 2020 (Microsoft). [Paper][PyTorch]
FuseFormer: "FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting", ICCV, 2021 (CUHK + SenseTime). [Paper][PyTorch]
ICT: "High-Fidelity Pluralistic Image Completion with Transformers", ICCV, 2021 (CUHK). [Paper][PyTorch][Website]
DSTT: "Decoupled Spatial-Temporal Transformer for Video Inpainting", arXiv, 2021 (CUHK + SenseTime). [Paper][Code (in construction)]
TFill: "TFill: Image Completion via a Transformer-Based Architecture", arXiv, 2021 (NTU Singapore). [Paper][Code (in construction)]
BAT-Fill: "Diverse Image Inpainting with Bidirectional and Autoregressive Transformers", arXiv, 2021 (NTU Singapore). [Paper]
?: "Image-Adaptive Hint Generation via Vision Transformer for Outpainting", WACV, 2022 (Sogang University, Korea). [Paper]
ZITS: "Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding", CVPR, 2022 (Fudan). [Paper][PyTorch][Website]
MAT: "MAT: Mask-Aware Transformer for Large Hole Image Inpainting", CVPR, 2022 (CUHK). [Paper][PyTorch]
PUT: "Reduce Information Loss in Transformers for Pluralistic Image Inpainting", CVPR, 2022 (Microsoft). [Paper][PyTorch]
DLFormer: "DLFormer: Discrete Latent Transformer for Video Inpainting", CVPR, 2022 (Tencent). [Paper][Code (in construction)]
QueryOTR: "Outpainting by Queries", ECCV, 2022 (University of Liverpool, UK). [Paper][PyTorch (in construction)]
FGT: "Flow-Guided Transformer for Video Inpainting", ECCV, 2022 (USTC). [Paper][PyTorch]
MAE-FAR: "Learning Prior Feature and Attention Enhanced Image Inpainting", ECCV, 2022 (Fudan University). [Paper][PyTorch (in construction)][Website]
?: "Visual Prompting via Image Inpainting", NeurIPS, 2022 (Berkeley). [Paper][PyTorch][Website]
U-Transformer: "Generalised Image Outpainting with U-Transformer", arXiv, 2022 (Xi'an Jiaotong-Liverpool University). [Paper]
SpA-Former: "SpA-Former: Transformer image shadow detection and removal via spatial attention", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][PyTorch]
CRFormer: "CRFormer: A Cross-Region Transformer for Shadow Removal", arXiv, 2022 (Beijing Jiaotong University). [Paper]
DeViT: "DeViT: Deformed Vision Transformers in Video Inpainting", arXiv, 2022 (Kuaishou). [Paper]
ZITS++: "ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors", arXiv, 2022 (Fudan). [Paper]
TPFNet: "TPFNet: A Novel Text In-painting Transformer for Text Removal", arXiv, 2022 (?). [Paper][Code (in construction)]
FlowLens: "FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer", arXiv, 2022 (Zhejiang University). [Paper][Code (in construction)]

[Back to Overview]

Image Generation

IT: "Image Transformer", ICML, 2018 (Google). [Paper][Tensorflow]
PixelSNAIL: "PixelSNAIL: An Improved Autoregressive Generative Model", ICML, 2018 (Berkeley). [Paper][Tensorflow]
BigGAN: "Large Scale GAN Training for High Fidelity Natural Image Synthesis", ICLR, 2019 (DeepMind). [Paper][PyTorch]
SAGAN: "Self-Attention Generative Adversarial Networks", ICML, 2019 (Google). [Paper][Tensorflow]
VQGAN: "Taming Transformers for High-Resolution Image Synthesis", CVPR, 2021 (Heidelberg University). [Paper][PyTorch][Website]
?: "High-Resolution Complex Scene Synthesis with Transformers", CVPRW, 2021 (Heidelberg University). [Paper]
GANsformer: "Generative Adversarial Transformers", ICML, 2021 (Stanford + Facebook). [Paper][Tensorflow]
PixelTransformer: "PixelTransformer: Sample Conditioned Signal Generation", ICML, 2021 (Facebook). [Paper][Website]
HWT: "Handwriting Transformers", ICCV, 2021 (MBZUAI). [Paper][Code (in construction)]
Paint-Transformer: "Paint Transformer: Feed Forward Neural Painting with Stroke Prediction", ICCV, 2021 (Baidu). [Paper][Paddle][PyTorch]
Geometry-Free: "Geometry-Free View Synthesis: Transformers and no 3D Priors", ICCV, 2021 (Heidelberg University). [Paper][PyTorch]
VTGAN: "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers", ICCVW, 2021 (University of Nevada, Reno). [Paper]
ATISS: "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS, 2021 (NVIDIA). [Paper][Website]
GANsformer2: "Compositional Transformers for Scene Generation", NeurIPS, 2021 (Stanford + Facebook). [Paper][Tensorflow]
TransGAN: "TransGAN: Two Transformers Can Make One Strong GAN", NeurIPS, 2021 (UT Austin). [Paper][PyTorch]
HiT: "Improved Transformer for High-Resolution GANs", NeurIPS, 2021 (Google). [Paper][Tensorflow]
iLAT: "The Image Local Autoregressive Transformer", NeurIPS, 2021 (Fudan). [Paper]
TokenGAN: "Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers", NeurIPS, 2021 (Microsoft). [Paper]
SceneFormer: "SceneFormer: Indoor Scene Generation with Transformers", arXiv, 2021 (TUM). [Paper]
SNGAN: "Combining Transformer Generators with Convolutional Discriminators", arXiv, 2021 (Fraunhofer ITWM). [Paper]
Invertible-Attention: "Invertible Attention", arXiv, 2021 (ANU). [Paper]
GPA: "Grid Partitioned Attention: Efficient Transformer Approximation with Inductive Bias for High Resolution Detail Generation", arXiv, 2021 (Zalando Research, Germany). [Paper][PyTorch (in construction)]
ViTGAN: "ViTGAN: Training GANs with Vision Transformers", ICLR, 2022 (Google). [Paper][PyTorch][PyTorch (wilile26811249)]
ViT-VQGAN: "Vector-quantized Image Modeling with Improved VQGAN", ICLR, 2022 (Google). [Paper]
Style-Transformer: "Style Transformer for Image Inversion and Editing", CVPR, 2022 (East China Normal University). [Paper][PyTorch]
StyleSwin: "StyleSwin: Transformer-based GAN for High-resolution Image Generation", CVPR, 2022 (Microsoft). [Paper][PyTorch]
Styleformer: "Styleformer: Transformer based Generative Adversarial Networks with Style Vector", CVPR, 2022 (Seoul National University). [Paper][PyTorch]
?: "User-Controllable Latent Transformer for StyleGAN Image Layout Editing", Pacific Graphics, 2022 (University of Tsukuba). [Paper][Website]
DynaST: "DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation", ECCV, 2022 (NUS). [Paper][PyTorch]
DoodleFormer: "DoodleFormer: Creative Sketch Drawing with Transformers", ECCV, 2022 (MBZUAI). [Paper][PyTorch][Website]
U-Attention: "Paying U-Attention to Textures: Multi-Stage Hourglass Vision Transformer for Universal Texture Synthesis", arXiv, 2022 (Adobe). [Paper]
MaskGIT: "MaskGIT: Masked Generative Image Transformer", CVPR, 2022 (Google). [Paper][PyTorch (dome272)]
AttnFlow: "Generative Flows with Invertible Attentions", CVPR, 2022 (ETHZ). [Paper]
NÜWA: "NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion", ECCV, 2022 (Microsoft). [Paper][GitHub]
Trans-INR: "Transformers as Meta-Learners for Implicit Neural Representations", ECCV, 2022 (UCSD). [Paper][PyTorch][Websiste]
ViewFormer: "ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers", ECCV, 2022 (Czech Technical University in Prague). [Paper][Tensorflow]
Unleashing-Transformer: "Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes", ECCV, 2022 (Durham University, UK). [Paper][PyTorch]
CASD: "Cross Attention Based Style Distribution for Controllable Person Image Synthesis", ECCV, 2022 (East China Norma lUniversity). [Paper]
VQGAN-CLIP: "VQGAN-CLIP: Open Domain Image Generation and Manipulation Using Natural Language ", ECCV, 2022 (EleutherAI). [Paper][PyTorch]
Token-Critic: "Improved Masked Image Generation with Token-Critic", ECCV, 2022 (Google). [Paper]
PromptGen: "Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models", NeurIPS, 2022 (CMU). [Paper][PyTorch]
Contextual-RQ-Transformer: "Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer", NeurIPS, 2022 (POSTECH + Kakao). [Paper]
ViT-Patch: "A Robust Framework of Chromosome Straightening with ViT-Patch GAN", arXiv, 2022 (Xi'an Jiaotong-Liverpool University). [Paper]
?: "Transforming Image Generation from Scene Graphs", arXiv, 2022 (University of Catania, Italy). [Paper]
VisionNeRF: "Vision Transformer for NeRF-Based View Synthesis from a Single Input Image", arXiv, 2022 (Google). [Paper][Website]
NUWA-Infinity: "NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis", arXiv, 2022 (Microsoft). [Paper][GitHub][Website]
Diffusion-ViT: "Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model", arXiv, 2022 (Etsy, NY). [Paper]
?: "Visual Prompt Tuning for Generative Transfer Learning", arXiv, 2022 (Google). [Paper]
SeQ-GAN: "Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis", arXiv, 2022 (Tencent). [Paper][Code (in construction)]
?: "Style-Guided Inference of Transformer for High-resolution Image Synthesis", WACV, 2023 (NCSOFT, Korea). [Paper]
Frido: "Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis", AAAI, 2023 (Microsoft). [Paper][PyTorch]

[Back to Overview]

Video Generation

Subscale: "Scaling Autoregressive Video Models", ICLR, 2020 (Google). [Paper][Website]
ConvTransformer: "ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis", arXiv, 2020 (Southeast University). [Paper]
OCVT: "Generative Video Transformer: Can Objects be the Words?", ICML, 2021 (Rutgers University). [Paper]
AIST++: "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation", arXiv, 2021 (Google). [Paper][Code][Website]
VideoGPT: "VideoGPT: Video Generation using VQ-VAE and Transformers", arXiv, 2021 (Berkeley). [Paper][PyTorch][Website]
DanceFormer: "DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer", AAAI, 2022 (Huiye Technology, China). [Paper]
VFIformer: "Video Frame Interpolation with Transformer", CVPR, 2022 (CUHK). [Paper][PyTorch]
VFIT: "Video Frame Interpolation Transformer", CVPR, 2022 (McMaster Univeristy, Canada). [Paper][PyTorch]
MoTrans: "Motion Transformer for Unsupervised Image Animation", ECCV, 2022 (Alibaba). [Paper][PyTorch]
Transframer: "Transframer: Arbitrary Frame Prediction with Generative Models", arXiv, 2022 (DeepMind). [Paper]
TATS: "Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer", ECCV, 2022 (Maryland). [Paper][Website]
POVT: "Patch-based Object-centric Transformers for Efficient Video Generation", arXiv, 2022 (Berkeley). [Paper][PyTorch][Website]
TAIN: "Cross-Attention Transformer for Video Interpolation", arXiv, 2022 (Duke). [Paper]
TTVFI: "TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation", arXiv, 2022 (Microsoft). [Paper]
TECO: "Temporally Consistent Video Transformer for Long-Term Video Prediction", arXiv, 2022 (Berkeley). [Paper][Jax][Website]
SlotFormer: "SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models", arXiv, 2022 (University of Toronto). [Paper][Website]
MAGVIT: "MAGVIT: Masked Generative Video Transformer", arXiv, 2022 (Google). [Paper][Code (in construction)][Website]

[Back to Overview]

Transfer / Translation / Manipulation

AdaAttN: "AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer", ICCV, 2021 (Baidu). [Paper][Paddle][PyTorch]
StyleCLIP: "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery", ICCV, 2021 (Hebrew University of Jerusalem). [Paper][PyTorch]
StyTr2: "StyTr^2: Unbiased Image Style Transfer with Transformers", CVPR, 2022 (CAS). [Paper][PyTorch]
InstaFormer: "InstaFormer: Instance-Aware Image-to-Image Translation with Transformer", CVPR, 2022 (Korea University). [Paper]
ManiTrans: "ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation", CVPR, 2022 (Huawei). [Paper][Website]
QS-Attn: "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation", CVPR, 2022 (Shanghai Key Laboratory). [Paper][PyTorch]
ASSET: "ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions", SIGGRAPH, 2022 (Adobe). [Paper][PyTorch][Website]
SCAM: "SCAM! Transferring humans between images with Semantic Cross Attention Modulation", ECCV, 2022 (Univ Gustave Eiffel, France). [Paper][PyTorch][Website]
TargetCLIP: "Image-Based CLIP-Guided Essence Transfer", ECCV, 2022 (Tel Aviv). [Paper][PyTorch]
FFCLIP: "One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations", NeurIPS, 2022 (Tencent). [Paper][Code (in construction)]
STTR: "Fine-Grained Image Style Transfer with Visual Transformers", ACCV, 2022 (The Univerisity of Tokyo). [Paper][PyTorch (in construction)]
Splice: "Splicing ViT Features for Semantic Appearance Transfer", arXiv, 2022 (Weizmann Institute of Science, Israel). [Paper][PyTorch][Website]
UVCGAN: "UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation", arXiv, 2022 (Brookhaven National Laboratory, NY). [Paper]
ITTR: "ITTR: Unpaired Image-to-Image Translation with Transformers", arXiv, 2022 (Kuaishou). [Paper]
CLIPasso: "CLIPasso: Semantically-Aware Object Sketching", arXiv, 2022 (EPFL). [Paper][PyTorch][Website]
CTrGAN: "CTrGAN: Cycle Transformers GAN for Gait Transfer", arXiv, 2022 (Ariel University, Israel). [Paper]
PI-Trans: "PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation", arXiv, 2022 (University of Trento, Italy). [Paper][PyTorch (in construction)]
CSLA: "Bridging CLIP and StyleGAN through Latent Alignment for Image Editing", arXiv, 2022 (Kuaishou). [Paper]
CLIP-PAE: "CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Image Manipulation", arXiv, 2022 (University of Cambridge). [Paper]
S2WAT: "S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention", arXiv, 2022 (Sichuan Normal University). [Paper]

[Back to Overview]

Other Low-Level Tasks

Colorization:
- ColTran: "Colorization Transformer", ICLR, 2021 (Google). [Paper][Tensorflow]
- ViT-I-GAN: "ViT-Inception-GAN for Image Colourising", arXiv, 2021 (D.Y Patil College of Engineering, India). [Paper]
- CT²: "CT²: Colorization Transformer via Color Tokens", ECCV, 2022 (Peking University). [Paper][PyTorch]
- L-CoDer: "L-CoDer: Language-based Colorization with Color-object Decoupling Transformer", ECCV, 2022 (Beijing University of Posts and Telecommunications). [Paper]
- ColorFormer: "ColorFormer: Image Colorization via Color Memory assisted Hybrid-attention Transformer", ECCV, 2022 (Tencent). [Paper]
- UniColor: "UniColor: A Unified Framework for Multi-Modal Colorization with Transformer", SIGGRAPH Asia, 2022 (CUHK). [Paper][Website]
- iColoriT: "iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer", arXiv, 2022 (KAIST). [Paper]
Enhancement:
- PanFormer: "PanFormer: a Transformer Based Model for Pan-sharpening", ICME, 2022 (Beihang University). [Paper][PyTorch]
- URSCT-UIE: "Reinforced Swin-Convs Transformer for Underwater Image Enhancement", arXiv, 2022 (Ningbo University). [Paper]
- IAT: "Illumination Adaptive Transformer", arXiv, 2022 (The University of Tokyo). [Paper][PyTorch]
- SPGAT: "Structural Prior Guided Generative Adversarial Transformers for Low-Light Image Enhancement", arXiv, 2022 (The Hong Kong Polytechnic University). [Paper]
- SSTF: "End-to-end Transformer for Compressed Video Quality Enhancement", arXiv, 2022 (Nanjing University of Information Science and Technology). [Paper]
HDR:
- CA-ViT: "Ghost-free High Dynamic Range Imaging with Context-aware Transformer", ECCV, 2022 (Megvii). [Paper][PyTorch]
- Selective-TransHDR: "Selective TransHDR: Transformer-Based Selective HDR Imaging Using Ghost Region Mask", ECCV, 2022 (Sogang University, Korea). [Paper]
- Text2Light: "Text2Light: Zero-Shot Text-Driven HDR Panorama Generation", SIGGRAPH Asia, 2022 (NTU, Singapore). [Paper][PyTorch][Website]
Harmonization:
- HT: "Image Harmonization With Transformer", ICCV, 2021 (Ocean University of China). [Paper]
Compression:
- ?: "Towards End-to-End Image Compression and Analysis with Transformers", AAAI, 2022 (1Harbin Institute of Technology). [Paper][PyTorch]
- Entroformer: "Entroformer: A Transformer-based Entropy Model for Learned Image Compression", ICLR, 2022 (Alibaba). [Paper]
- STF: "The Devil Is in the Details: Window-based Attention for Image Compression", CVPR, 2022 (CAS). [Paper][PyTorch]
- Contextformer: "Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression", ECCV, 2022 (TUM). [Paper]
- VCT: "VCT: A Video Compression Transformer", NeurIPS, 2022 (Google). [Paper]
Matting:
- MatteFormer: "MatteFormer: Transformer-Based Image Matting via Prior-Tokens", CVPR, 2022 (SNU + NAVER). [Paper][PyTorch]
- TransMatting: "TransMatting: Enhancing Transparent Objects Matting with Transformers", ECCV, 2022 (CAS). [Paper][Code (in construction)]
- VMFormer: "VMFormer: End-to-End Video Matting with Transformer", arXiv, 2022 (PicsArt). [Paper][PyTorch][Website]
Reconstruction
- ET-Net: "Event-Based Video Reconstruction Using Transformer", ICCV, 2021 (University of Science and Technology of China). [Paper][PyTorch]
- GradViT: "GradViT: Gradient Inversion of Vision Transformers", CVPR, 2022 (NVIDIA). [Paper][Website]
- MST: "Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction", CVPR, 2022 (Tsinghua). [Paper][PyTorch]
- MST++: "MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction", CVPRW, 2022 (Tsinghua). [Paper][PyTorch]
- CST: "Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction", ECCV, 2022 (Tsinghua). [Paper][PyTorch]
- DAUHST: "Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging", NeurIPS, 2022 (Tsinghua). [Paper][PyTorch]
- S²-Transformer: "S²-Transformer for Mask-Aware Hyperspectral Image Reconstruction", arXiv, 2022 (Rochester Institute of Technology). [Paper]
Radiance Fields:
- NeXT: "NeXT: Towards High Quality Neural Radiance Fields via Multi-Skip Transformer", ECCV, 2022 (Tsinghua University). [Paper][JAX]
- TransNeRF: "Generalizable Neural Radiance Fields for Novel View Synthesis with Transformer", arXiv, 2022 (UBC). [Paper]
3D:
- MNSRNet: "MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution", CVPR, 2022 (Shenzhen University). [Paper]
Others:
- TransMEF: "TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning", AAAI, 2022 (Fudan). [Paper]
- MS-Unet: "Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer", CVPR, 2022 (Megvii). [Paper][Code (in construction)]
- TransCL: "TransCL: Transformer Makes Strong and Flexible Compressive Learning", TPAMI, 2022 (Peking University). [Paper][Code (in construction)]
- GAP-CSCoT: "Spectral Compressive Imaging Reconstruction Using Convolution and Spectral Contextual Transformer", arXiv, 2022 (CAS). [Paper]
- MatFormer: "MatFormer: A Generative Model for Procedural Materials", arXiv, 2022 (Adobe). [Paper]
- FishFormer: "FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification with Efficacy Domain Exploration", arXiv, 2022 (Beijing Jiaotong University). [Paper]
- STFormer: "Spatial-Temporal Transformer for Video Snapshot Compressive Imaging", arXiv, 2022 (CAS). [Paper][PyTorch]

[Back to Overview]

Reinforcement Learning

Navigation

VTNet: "VTNet: Visual Transformer Network for Object Goal Navigation", ICLR, 2021 (ANU). [Paper]
MaAST: "MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation", ICRA, 2021 (SRI). [Paper]
TransFuser: "Multi-Modal Fusion Transformer for End-to-End Autonomous Driving", CVPR, 2021 (MPI). [Paper][PyTorch]
CMTP: "Topological Planning With Transformers for Vision-and-Language Navigation", CVPR, 2021 (Stanford). [Paper]
VLN-BERT: "VLN-BERT: A Recurrent Vision-and-Language BERT for Navigation", CVPR, 2021 (ANU). [Paper][PyTorch]
E.T.: "Episodic Transformer for Vision-and-Language Navigation", ICCV, 2021 (Google). [Paper][PyTorch]
HAMT: "History Aware Multimodal Transformer for Vision-and-Language Navigation", NeurIPS, 2021 (INRIA). [Paper][PyTorch][Website]
SOAT: "SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation", NeurIPS, 2021 (Georgia Tech). [Paper]
OMT: "Object Memory Transformer for Object Goal Navigation", ICRA, 2022 (AIST, Japan). [Paper]
ADAPT: "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts", CVPR, 2022 (Huawei). [Paper]
DUET: "Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation", CVPR, 2022 (INRIA). [Paper][Website]
LSA: "Local Slot Attention for Vision-and-Language Navigation", ICMR, 2022 (Fudan). [Paper]
?: "Learning from Unlabeled 3D Environments for Vision-and-Language Navigation", ECCV, 2022 (INRIA). [Paper][Website]
MTVM: "Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation", ECCV, 2022 (ByteDance). [Paper][PyTorch]
DDL: "Learning Disentanglement with Decoupled Labels for Vision-Language Navigation", ECCV, 2022 (Beijing Institute of Technology). [Paper][PyTorch]
Sim2Sim: "Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments", ECCV, 2022 (Oregon State University). [Paper][PyTorch][Website]
AVLEN: "AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments", NeurIPS, 2022 (UC Riverside). [Paper]
ZSON: "ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings", NeurIPS, 2022 (Georgia Tech). [Paper]
WS-MGMap: "Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation", NeurIPS, 2022 (South China University of Technology). [Paper][PyTorch (in construction)]
CLIP-Nav: "CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation", CoRLW, 2022 (Amazon). [Paper]
TransFuser: "TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving", arXiv, 2022 (MPI). [Paper]
TD-STP: "Target-Driven Structured Transformer Planner for Vision-Language Navigation", arXiv, 2022 (Beihang University). [Paper][Code (in construction)]
DAVIS: "Anticipating the Unseen Discrepancy for Vision and Language Navigation", arXiv, 2022 (UCSB). [Paper]
LOViS: "LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation", arXiv, 2022 (Michigan State). [Paper]
IVLN: "Iterative Vision-and-Language Navigation", arXiv, 2022 (Oregon State University). [Paper]
BEVBert: "BEVBert: Topo-Metric Map Pre-training for Language-guided Navigation", arXiv, 2022 (CAS). [Paper]

[Back to Overview]

Other RL Tasks

SVEA: "Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation", arXiv, 2021 (UCSD). [Paper][GitHub][Website]
LocoTransformer: "Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers", ICLR, 2022 (UCSD). [Paper][Website]
STAM: "Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes", CVPR, 2022 (McGill University, Canada). [Paper][PyTorch]
CtrlFormer: "CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer", ICML, 2022 (HKU). [Paper][PyTorch][Website]
PromptDT: "Prompting Decision Transformer for Few-Shot Policy Generalization", ICML, 2022 (CMU). [Paper][Website]
StARformer: "StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning", ECCV, 2022 (Stony Brook). [Paper][PyTorch]
RAD: "Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels", arXiv, 2022 (UBC, Canada). [Paper]
MWM: "Masked World Models for Visual Control", arXiv, 2022 (Berkeley). [Paper][Tensorflow][Website]
IRIS: "Transformers are Sample Efficient World Models", arXiv, 2022 (University of Geneva, Switzerland). [Paper][PyTorch]
InstructRL: "Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models", arXiv, 2022 (Google). [Paper]

[Back to Overview]

Medical

Medical Segmentation

Cross-Transformer: "The entire network structure of Crossmodal Transformer", ICBSIP, 2021 (Capital Medical University). [Paper]
Segtran: "Medical Image Segmentation using Squeeze-and-Expansion Transformers", IJCAI, 2021 (A*STAR). [Paper]
i-ViT: "Instance-based Vision Transformer for Subtyping of Papillary Renal Cell Carcinoma in Histopathological Image", MICCAI, 2021 (Xi'an Jiaotong University). [Paper][PyTorch][Website]
UTNet: "UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation", MICCAI, 2021 (Rutgers). [Paper]
MCTrans: "Multi-Compound Transformer for Accurate Biomedical Image Segmentation", MICCAI, 2021 (HKU + CUHK). [Paper][Code (in construction)]
Polyformer: "Few-Shot Domain Adaptation with Polymorphic Transformers", MICCAI, 2021 (A*STAR). [Paper][PyTorch]
BA-Transformer: "Boundary-aware Transformers for Skin Lesion Segmentation". MICCAI, 2021 (Xiamen University). [Paper][PyTorch]
GT-U-Net: "GT U-Net: A U-Net Like Group Transformer Network for Tooth Root Segmentation", MICCAIW, 2021 (Hangzhou Dianzi University). [Paper][PyTorch]
STN: "Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation", ISBI, 2021 (Institut Polytechnique de Paris). [Paper]
T-AutoML: "T-AutoML: Automated Machine Learning for Lesion Segmentation Using Transformers in 3D Medical Imaging", ICCV, 2021 (NVIDIA). [Paper]
MedT: "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation", arXiv, 2021 (Johns Hopkins). [Paper][PyTorch]
Convolution-Free: "Convolution-Free Medical Image Segmentation using Transformers", arXiv, 2021 (Harvard). [Paper]
CoTR: "CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation", arXiv, 2021 (Northwestern Polytechnical University). [Paper][PyTorch]
TransBTS: "TransBTS: Multimodal Brain Tumor Segmentation Using Transformer", arXiv, 2021 (University of Science and Technology Beijing). [Paper][PyTorch]
SpecTr: "SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation", arXiv, 2021 (East China Normal University). [Paper][Code (in construction)]
U-Transformer: "U-Net Transformer: Self and Cross Attention for Medical Image Segmentation", arXiv, 2021 (CEDRIC). [Paper]
TransUNet: "TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation", arXiv, 2021 (Johns Hopkins). [Paper][PyTorch]
PMTrans: "Pyramid Medical Transformer for Medical Image Segmentation", arXiv, 2021 (Washington University in St. Louis). [Paper]
PBT-Net: "Anatomy-Guided Parallel Bottleneck Transformer Network for Automated Evaluation of Root Canal Therapy", arXiv, 2021 (Hangzhou Dianzi University). [Paper]
Swin-Unet: "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation", arXiv, 2021 (Huawei). [Paper][Code (in construction)]
MBT-Net: "A Multi-Branch Hybrid Transformer Networkfor Corneal Endothelial Cell Segmentation", arXiv, 2021 (Southern University of Science and Technology). [Paper]
WAD: "More than Encoder: Introducing Transformer Decoder to Upsample", arXiv, 2021 (South China University of Technology). [Paper]
LeViT-UNet: "LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation", arXiv, 2021 (Wuhan Institute of Technology). [Paper]
?: "Evaluating Transformer based Semantic Segmentation Networks for Pathological Image Segmentation", arXiv, 2021 (Vanderbilt University). [Paper]
nnFormer: "nnFormer: Interleaved Transformer for Volumetric Segmentation", arXiv, 2021 (HKU + Xiamen University). [Paper][PyTorch]
MISSFormer: "MISSFormer: An Effective Medical Image Segmentation Transformer", arXiv, 2021 (Beijing University of Posts and Telecommunications). [Paper]
TUnet: "Transformer-Unet: Raw Image Processing with Unet", arXiv, 2021 (Beijing Zoezen Robot + Beihang University). [Paper]
BiTr-Unet: "BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor Segmentation", arXiv, 2021 (New York University). [Paper]
?: "Transformer Assisted Convolutional Network for Cell Instance Segmentation", arXiv, 2021 (IIT Dhanbad). [Paper]
?: "Combining CNNs With Transformer for Multimodal 3D MRI Brain Tumor Segmentation With Self-Supervised Pretraining", arXiv, 2021 (Ukrainian Catholic University). [Paper]
UNETR: "UNETR: Transformers for 3D Medical Image Segmentation", WACV, 2022 (NVIDIA). [Paper][PyTorch]
AFTer-UNet: "AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation", WACV, 2022 (UC Irvine). [Paper]
UCTransNet: "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer", AAAI, 2022 (Northeastern University, China). [Paper][PyTorch]
Swin-UNETR: "Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis", CVPR, 2022 (NVIDIA). [Paper][PyTorch]
?: "Transformer-based out-of-distribution detection for clinically safe segmentation", Medical Imaging with Deep Learning (MIDL), 2022 (King’s College London). [Paper]
ScaleFormer: "ScaleFormer: Revisiting the Transformer-based Backbones from a Scale-wise Perspective for Medical Image Segmentation", IJCAI, 2022 (Zhejiang University). [Paper][Code (in construction)]
FCBFormer: "FCN-Transformer Feature Fusion for Polyp Segmentation", Annual Conference on Medical Image Understanding and Analysis (MIUA), 2022 (University of Central Lancashire, UK). [Paper][PyTorch]
VDFormer: "View-Disentangled Transformer for Brain Lesion Detection", ISBI, 2022 (CUHK). [Paper][PyTorch]
TFCNs: "TFCNs: A CNN-Transformer Hybrid Network for Medical Image Segmentation", International Conference on Artificial Neural Networks (ICANN), 2022 (Xiamen University). [Paper][PyTorch (in construction)]
MIL: "Transformer based multiple instance learning for weakly supervised histopathology image segmentation", MICCAI, 2022 (Beihang University). [Paper]
mmFormer: "mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation", MICCAI, 2022 (CAS). [Paper][PyTorch]
Patcher: "Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation", MICCAI, 2022 (Pennsylvania State University). [Paper]
NestedFormer: "NestedFormer: Nested Modality-Aware Transformer for Brain Tumor Segmentation", MICCAI, 2022 (Tianjin University). [Paper][Code (in construction)]
TransDeepLab: "TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation", MICCAIW, 2022 (RWTH Aachen University, Germany). [Paper][PyTorch]
Video-TransUNet: "Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation", International Conference on Machine Vision (ICMV), 2022 (University of Bristol, UK). [Paper]
CASTformer: "Class-Aware Adversarial Transformers for Medical Image Segmentation", NeurIPS, 2022 (Yale). [Paper]
Tempera: "Tempera: Spatial Transformer Feature Pyramid Network for Cardiac MRI Segmentation", arXiv, 2022 (ICL). [Paper]
UTNetV2: "A Multi-scale Transformer for Medical Image Segmentation: Architectures, Model Efficiency, and Benchmarks", arXiv, 2022 (Rutgers). [Paper]
UNesT: "Characterizing Renal Structures with 3D Block Aggregate Transformers", arXiv, 2022 (Vanderbilt University, Tennessee). [Paper]
PHTrans: "PHTrans: Parallelly Aggregating Global and Local Representations for Medical Image Segmentation", arXiv, 2022 (Beijing University of Posts and Telecommunications). [Paper]
UNeXt: "UNeXt: MLP-based Rapid Medical Image Segmentation Network", arXiv, 2022 (JHU). [Paper][PyTorch]
TransFusion: "TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers", arXiv, 2022 (Rutgers). [Paper]
UNetFormer: "UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation", arXiv, 2022 (NVIDIA). [Paper][GitHub]
3D-Shuffle-Mixer: "3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume", arXiv, 2022 (Xi'an Jiaotong University). [Paper]
?: "Continual Hippocampus Segmentation with Transformers", arXiv, 2022 (Technical University of Darmstadt, Germany). [Paper]
TranSiam: "TranSiam: Fusing Multimodal Visual Features Using Transformer for Medical Image Segmentation", arXiv, 2022 (Tianjin University). [Paper]
ColonFormer: "ColonFormer: An Efficient Transformer based Method for Colon Polyp Segmentation", arXiv, 2022 (Hanoi University of Science and Technology). [Paper]
?: "Transformer based Generative Adversarial Network for Liver Segmentation", arXiv, 2022 (Northwestern University). [Paper]
FCT: "The Fully Convolutional Transformer for Medical Image Segmentation", arXiv, 2022 (University of Glasgow, UK). [Paper]
XBound-Former: "XBound-Former: Toward Cross-scale Boundary Modeling in Transformers", arXiv, 2022 (Xiamen University). [Paper][PyTorch]
Polyp-PVT: "Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers", arxiv, 2022 (IIAI). [Paper][PyTorch]
SeATrans: "SeATrans: Learning Segmentation-Assisted diagnosis model via Transformer", arXiv, 2022 (Baidu). [Paper]
TransResU-Net: "TransResU-Net: Transformer based ResU-Net for Real-Time Colonoscopy Polyp Segmentation", arXiv, 2022 (Indira Gandhi National Open University). [Paper][Code (in construction)]
LViT: "LViT: Language meets Vision Transformer in Medical Image Segmentation", arXiv, 2022 (Alibaba). [Paper][Code (in construction)]
APFormer: "The Lighter The Better: Rethinking Transformers in Medical Image Segmentation Through Adaptive Pruning", arXiv, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
?: "Transformer based Models for Unsupervised Anomaly Segmentation in Brain MR Images", arXiv, 2022 (University of Rennes, France). [Paper][Tensorflow]
CKD-TransBTS: "CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation", arXiv, 2022 (South China University of Technology). [Paper]
HiFormer: "HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation", arXiv, 2022 (Iran University of Science and Technology). [Paper][PyTorch]
?: "Contextual Attention Network: Transformer Meets U-Net", arXiv, 2022 (RWTH Aachen University). [Paper][PyTorch]
HRSTNet: "High-Resolution Swin Transformer for Automatic Medical Image Segmentation", arXiv, 2022 (Xi'an University of Posts and Telecommunications). [Paper][Code (in construction)]
TransNorm: "TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism for a Deep Segmentation Model", arXiv, 2022 (Aachen University, Germany). [Paper][PyTorch]
?: "When CNN Meet with ViT: Towards Semi-Supervised Learning for Multi-Class Medical Image Semantic Segmentation", arXiv, 2022 (Oxford). [Paper][Code (in construction)]
CM-MLP: "CM-MLP: Cascade Multi-scale MLP with Axial Context Relation Encoder for Edge Segmentation of Medical Image", arXiv, 2022 (Zhengzhou University). [Paper]
CATS: "Cats: Complementary CNN and Transformer Encoders for Segmentation", arXiv, 2022 (Vanderbilt University, Nashville). [Paper]
TFusion: "TFusion: Transformer based N-to-One Multimodal Fusion Block", arXiv, 2022 (SouthChinaUniversityofTechnology). [Paper]
AutoPET: "AutoPET Challenge: Combining nn-Unet with Swin UNETR Augmented by Maximum Intensity Projection Classifier", arXiv, 2022 (University Hospital Essen, Germany). [Paper]
SPAN: "Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers", arXiv, 2022 (Berkeley). [Paper]
TMSS: "TMSS: An End-to-End Transformer-based Multimodal Network for Segmentation and Survival Prediction", arXiv, 2022 (MBZUAI). [Paper]
CR-Swin2-VT: "Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation", arXiv, 2022 (Monash University). [Paper][PyTorch]
3DUX-Net: "3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation", arXiv, 2022 (Vanderbilt University). [Paper][PyTorch]
FocalUNETR: "FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images", arXiv, 2022 (Wayne State University, Detroit). [Paper]
LAPFormer: "LAPFormer: A Light and Accurate Polyp Segmentation Transformer", arXiv, 2022 (Sun*, Hanoi). [Paper]
FINE: "Memory transformers for full context and high-resolution 3D Medical Segmentation", arXiv, 2022 (National Conservatory of Arts and Crafts, France). [Paper]
ConvTransSeg: "ConvTransSeg: A Multi-resolution Convolution-Transformer Network for Medical Image Segmentation", arXiv, 2022 (University of Nottingham, UK). [Paper]
CS-Unet: "Optimizing Vision Transformers for Medical Image Segmentation and Few-Shot Domain Adaptation", arXiv, 2022 (University of Glasgow, UK). [Paper]
UNETR++: "UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation", arXiv, 2022 (MBZUAI). [Paper][PyTorch]

[Back to Overview]

Medical Classification

COVID19T: "A Transformer-Based Framework for Automatic COVID19 Diagnosis in Chest CTs", ICCVW, 2021 (?). [Paper][PyTorch]
TransMIL: "TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication", NeurIPS, 2021 (Tsinghua University). [Paper][PyTorch]
TransMed: "TransMed: Transformers Advance Multi-modal Medical Image Classification", arXiv, 2021 (Northeastern University). [Paper]
CXR-ViT: "Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification", arXiv, 2021 (KAIST). [Paper]
ViT-TSA: "Shoulder Implant X-Ray Manufacturer Classification: Exploring with Vision Transformer", arXiv, 2021 (Queen’s University). [Paper]
GasHis-Transformer: "GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathology Image Classification", arXiv, 2021 (Northeastern University). [Paper]
POCFormer: "POCFormer: A Lightweight Transformer Architecture for Detection of COVID-19 Using Point of Care Ultrasound", arXiv, 2021 (The Ohio State University). [Paper]
COVID-ViT: "COVID-VIT: Classification of COVID-19 from CT chest images based on vision transformer models", arXiv, 2021 (Middlesex University, UK). [Paper][PyTorch]
EEG-ConvTransformer: "EEG-ConvTransformer for Single-Trial EEG based Visual Stimuli Classification", arXiv, 2021 (IIT Ropar). [Paper]
CCAT: "Visual Transformer with Statistical Test for COVID-19 Classification", arXiv, 2021 (NCKU). [Paper]
M3T: "M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer", CVPR, 2022 (Yonsei University). [Paper]
?: "A comparative study between vision transformers and CNNs in digital pathology", CVPRW, 2022 (Roche, Switzerland). [Paper]
SCT: "Context-Aware Transformers For Spinal Cancer Detection and Radiological Grading", MICCAI, 2022 (Oxford). [Paper]
KAT: "Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification", MICCAI, 2022 (Beihang University). [Paper][PyTorch]
SEViT: "Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification", MICCAI, 2022 (MBZUAI). [Paper][PyTorch]
MF-ViT: "Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis", MICCAIW, 2022 (Rutgers University). [Paper][PyTorch]
SB-SSL: "SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI", MICCAIW, 2022 (University of Surrey, UK). [Paper]
RadioTransformer: "RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention-guided Disease Classification", ECCV, 2022 (Stony Brook). [Paper][Tensorflow (in construction)]
ScoreNet: "ScoreNet: Learning Non-Uniform Attention and Augmentation for Transformer-Based Histopathological Image Classification", arXiv, 2022 (EPFL). [Paper]
LA-MIL: "Local Attention Graph-based Transformer for Multi-target Genetic Alteration Prediction", arXiv, 2022 (TUM). [Paper]
HoVer-Trans: "HoVer-Trans: Anatomy-aware HoVer-Transformer for ROI-free Breast Cancer Diagnosis in Ultrasound Images", arXiv, 2022 (South China University of Technology). [Paper]
GTP: "A graph-transformer for whole slide image classification", arXiv, 2022 (Boston University). [Paper]
?: "Zero-Shot and Few-Shot Learning for Lung Cancer Multi-Label Classification using Vision Transformer", arXiv, 2022 (Harvard). [Paper]
SwinCheX: "SwinCheX: Multi-label classification on chest X-ray images with transformers", arXiv, 2022 (Sharif University of Technology, Iran). [Paper]
SGT: "Rectify ViT Shortcut Learning by Visual Saliency", arXiv, 2022 (Northwestern Polytechnical University, China). [Paper]
IPMN-ViT: "Neural Transformers for Intraductal Papillary Mucosal Neoplasms (IPMN) Classification in MRI images", arXiv, 2022 (University of Catania, Italy). [Paper]
?: "Multi-Label Retinal Disease Classification using Transformers", arXiv, 2022 (Khalifa University, UAE). [Paper][PyTorch]
TractoFormer: "TractoFormer: A Novel Fiber-level Whole Brain Tractography Analysis Framework Using Spectral Embedding and Vision Transformers", arXiv, 2022 (Harvard). [Paper]
BrainFormer: "BrainFormer: A Hybrid CNN-Transformer Model for Brain fMRI Data Classification", arXiv, 2022 (Chinese PLA General Hospital). [Paper]
SI-ViT: "Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification", arXiv, 2022 (Beihang University). [Paper][PyTorch]

[Back to Overview]

Medical Detection

COTR: "COTR: Convolution in Transformer Network for End to End Polyp Detection", arXiv, 2021 (Fuzhou University). [Paper]
TR-Net: "Transformer Network for Significant Stenosis Detection in CCTA of Coronary Arteries", arXiv, 2021 (Harbin Institute of Technology). [Paper]
CAE-Transformer: "CAE-Transformer: Transformer-based Model to Predict Invasiveness of Lung Adenocarcinoma Subsolid Nodules from Non-thin Section 3D CT Scans", arXiv, 2021 (Concordia University, Canada). [Paper]
DATR: "DATR: Domain-adaptive transformer for multi-domain landmark detection", arXiv, 2022 (CAS). [Paper]
SATr: "SATr: Slice Attention with Transformer for Universal Lesion Detection", arXiv, 2022 (CAS). [Paper]
Focused-Decoder: "Focused Decoding Enables 3D Anatomical Detection by Transformers", arXiv, 2022 (TUM). [Paper][PyTorch]

[Back to Overview]

Medical Reconstruction

T²Net: "Task Transformer Network for Joint MRI Reconstruction and Super-Resolution", MICCAI, 2021 (Harbin Institute of Technology). [Paper][PyTorch]
FIT: "Fourier Image Transformer", arXiv, 2021 (MPI). [Paper][PyTorch]
SLATER: "Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers", arXiv, 2021 (Bilkent University). [Paper]
MTrans: "MTrans: Multi-Modal Transformer for Accelerated MR Imaging", arXiv, 2021 (Harbin Institute of Technology). [Paper][PyTorch]
SDAUT: "Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI", MICCAI, 2022 (ICL). [Paper]
?: "Adaptively Re-weighting Multi-Loss Untrained Transformer for Sparse-View Cone-Beam CT Reconstruction", arXiv, 2022 (Zhejiang Lab). [Paper]
K-Space-Transformer: "K-Space Transformer for Fast MRI Reconstruction with Implicit Representation", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][Code (in construction)][Website]
McSTRA: "Multi-head Cascaded Swin Transformers with Attention to k-space Sampling Pattern for Accelerated MRI Reconstruction", arXiv, 2022 (Monash University, Australia). [Paper]
?: "Colonoscopy Landmark Detection using Vision Transformers", arXiv, 2022 (Intuitive Surgical, CA). [Paper]

[Back to Overview]

Medical Low-Level Vision

Eformer: "Eformer: Edge Enhancement based Transformer for Medical Image Denoising", ICCV, 2021 (BITS Pilani, India). [Paper]
PTNet: "PTNet: A High-Resolution Infant MRI Synthesizer Based on Transformer", arXiv, 2021 (* Columbia *). [Paper]
ResViT: "ResViT: Residual vision transformers for multi-modal medical image synthesis", arXiv, 2021 (Bilkent University, Turkey). [Paper]
CyTran: "CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation", arXiv, 2021 (University Politehnica of Bucharest, Romania). [Paper][PyTorch]
McMRSR: "Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution", CVPR, 2022 (Yantai University, China). [Paper][PyTorch]
RPLHR-CT: "RPLHR-CT Dataset and Transformer Baseline for Volumetric Super-Resolution from CT Scans", MICCAI, 2022 (Infervision Medical Technology, China). [Paper][Code (in construction)]
W-G2L-ART: "Wide Range MRI Artifact Removal with Transformers", BMVC, 2022 (KTH). [Paper]
RFormer: "RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark", arXiv, 2022 (Tsinghua). [Paper]
CTformer: "CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising", arXiv, 2022 (UMass Lowell). [Paper][PyTorch]
Cohf-T: "Cross-Modality High-Frequency Transformer for MR Image Super-Resolution", arXiv, 2022 (Xidian University). [Paper]
SIST: "Low-Dose CT Denoising via Sinogram Inner-Structure Transformer", arXiv, 2022 (?). [Paper]
Spach-Transformer: "Spach Transformer: Spatial and Channel-wise Transformer Based on Local and Global Self-attentions for PET Image Denoising", arXiv, 2022 (Harvard). [Paper]
ConvFormer: "ConvFormer: Combining CNN and Transformer for Medical Image Segmentation", arXiv, 2022 (University of Notre Dame). [Paper]

[Back to Overview]

Medical Vision-Language

CGT: "Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation", CVPR, 2022 (University of Technology Sydney). [Paper]
MCGN: "A Medical Semantic-Assisted Transformer for Radiographic Report Generation", MICCAI, 2022 (University of Sydney). [Paper]
M3AE: "Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training", MICCAI, 2022 (CUHK). [Paper][PyTorch]
BioViL: "Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing", ECCV, 2022 (Microsoft). [Paper][Code]
MGCA: "Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning", NeurIPS, 2022 (HKU). [Paper]
MedCLIP: "MedCLIP: Contrastive Learning from Unpaired Medical Images and Text", EMNLP, 2022 (UIUC). [Paper][PyTorch]
MDBERT: "Hierarchical BERT for Medical Document Understanding", arXiv, 2022 (IQVIA, NC). [Paper]
Surgical-VQA: "Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer", arXiv, 2022 (NUS). [Paper][PyTorch (in construction)]
SwinMLP-TranCAP: "Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches", arXiv, 2022 (CUHK). [Paper][PyTorch]
SAT: "Medical Image Captioning via Generative Pretrained Transformers", arXiv, 2022 (Philips Innovation Labs Rus, Russia). [Paper]
RepsNet: "RepsNet: Combining Vision with Language for Automated Medical Reports", arXiv, 2022 (Google). [Paper][Website]
MF²-MVQA: "MF²-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering", arXiv, 2022 (University of Science and Technology Beijing). [Paper]
RoentGen: "RoentGen: Vision-Language Foundation Model for Chest X-ray Generation", arXiv, 2022 (Stanford). [Paper]

[Back to Overview]

Medical Others

LAT: "Lesion-Aware Transformers for Diabetic Retinopathy Grading", CVPR, 2021 (USTC). [Paper]
UVT: "Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation", MICCAI, 2021 (ICL). [Paper][PyTorch]
?: "Surgical Instruction Generation with Transformers", MICCAI, 2021 (Bournemouth University, UK). [Paper]
AlignTransformer: "AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation", MICCAI, 2021 (Peking University). [Paper]
MCAT: "Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images", ICCV, 2021 (Harvard). [Paper][PyTorch]
?: "Is it Time to Replace CNNs with Transformers for Medical Images?", ICCVW, 2021 (KTH, Sweden). [Paper]
HAT-Net: "HAT-Net: A Hierarchical Transformer Graph Neural Network for Grading of Colorectal Cancer Histology Images", BMVC, 2021 (Beijing University of Posts and Telecommunications). [Paper]
?: "Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training", NeurIPS, 2021 (KAIST). [Paper]
ViT-Path: "Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology", NeurIPSW, 2021 (Microsoft). [Paper]
Global-Local-Transformer: "Global-Local Transformer for Brain Age Estimation", IEEE Transactions on Medical Imaging, 2021 (Harvard). [Paper][PyTorch]
CE-TFE: "Deep Transformers for Fast Small Intestine Grounding in Capsule Endoscope Video", arXiv, 2021 (Sun Yat-Sen University). [Paper]
DeepProg: "DeepProg: A Transformer-based Framework for Predicting Disease Prognosis", arXiv, 2021 (University of Oulu). [Paper]
Medical-Transformer: "Medical Transformer: Universal Brain Encoder for 3D MRI Analysis", arXiv, 2021 (Korea University). [Paper]
RATCHET: "RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting", arXiv, 2021 (ICL). [Paper]
C2FViT: "Affine Medical Image Registration with Coarse-to-Fine Vision Transformer", CVPR, 2022 (HKUST). [Paper][Code (in construction)]
HIPT: "Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning", CVPR, 2022 (Harvard). [Paper]
SiT: "Surface Analysis with Vision Transformers", CVPRW, 2022 (King’s College London, UK). [Paper][PyTorch]
SiT: "Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis", Medical Imaging with Deep Learning (MIDL), 2022 (King’s College London, UK). [Paper]
ViT-V-Net: "ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration", ICML, 2022 (JHU). [Paper][PyTorch]
HybridStereoNet: "Deep Laparoscopic Stereo Matching with Transformers", MICCAI, 2022 (Monash University, Australia). [Paper][PyTorch]
BabyNet: "BabyNet: Residual Transformer Module for Birth Weight Prediction on Fetal Ultrasound Video", MICCAI, 2022 (Sano Centre for Computational Medicine, Poland). [Paper][PyTorch]
TLT: "Transformer Lesion Tracker", MICCAI, 2022 (InferVision Medical Technology, China). [Paper]
XMorpher: "XMorpher: Full Transformer for Deformable Medical Image Registration via Cross Attention", MICCAI, 2022 (Southeast University, China). [Paper][PyTorch]
SVoRT: "SVoRT: Iterative Transformer for Slice-to-Volume Registration in Fetal Brain MRI", MICCAI, 2022 (MIT). [Paper]
GaitForeMer: "GaitForeMer: Self-Supervised Pre-Training of Transformers via Human Motion Forecasting for Few-Shot Gait Impairment Severity Estimation", MICCAI, 2022 (Stanford). [Paper][PyTorch]
LKU-Net: "U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?", MICCAIW, 2022 (University of Birmingham, UK). [Paper]
LVOT: "Shifted Windows Transformers for Medical Image Quality Assessment", MICCAIW, 2022 (Istanbul Technical University, Turkey). [Paper]
MINiT: "Multiple Instance Neuroimage Transformer", MICCAIW, 2022 (Stanford). [Paper][Code (in construction)]
BrainNetTF: "Brain Network Transformer", NeurIPS, 2022 (Emory University). [Paper][PyTorch]
SiT: "Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces", arXiv, 2022 (King’s College London, UK). [Paper][PyTorch]
TransMorph: "TransMorph: Transformer for unsupervised medical image registration", arXiv, 2022 (JHU). [Paper]
SymTrans: "Symmetric Transformer-based Nwholeetwork for Unsupervised Image Registration", arXiv, 2022 (Jilin University). [Paper]
MMT: "One Model to Synthesize Them All: Multi-contrast Multi-scale Transformer for Missing Data Imputation", arXiv, 2022 (JHU). [Paper]
EG-ViT: "Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning", arXiv, 2022 (Northwestern Polytechnical University). [Paper]
CSM: "Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection", arXiv, 2022 (University of Adelaide, Australia). [Paper]
CASHformer: "CASHformer: Cognition Aware SHape Transformer for Longitudinal Analysis", arXiv, 2022 (TUM). [Paper]
ARST: "ARST: Auto-Regressive Surgical Transformer for Phase Recognition from Laparoscopic Videos", arXiv, 2022 (Shanghai Jiao Tong University). [Paper]
SSiT: "SSiT: Saliency-guided Self-supervised Image Transformer for Diabetic Retinopathy Grading", arXiv, 2022 (Southern University of Science and Techonology, China). [Paper][Code (in construction)]

[Back to Overview]

Other Tasks

Active Learning:
- TJLS: "Visual Transformer for Task-aware Active Learning", arXiv, 2021 (ICL). [Paper][PyTorch]
Agriculture:
- PlantXViT: "Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT", arXiv, 2922 (Indian Institute of Information Technology). [Paper]
Animation-related:
- AnT: "The Animation Transformer: Visual Correspondence via Segment Matching", ICCV, 2021 (Cadmium). [Paper]
- AniFormer: "AniFormer: Data-driven 3D Animation with Transformer", BMVC, 2021 (University of Oulu, Finland). [Paper][PyTorch]
Biology:
- ?: "A State-of-the-art Survey of Object Detection Techniques in Microorganism Image Analysis: from Traditional Image Processing and Classical Machine Learning to Current Deep Convolutional Neural Networks and Potential Visual Transformers", arXiv, 2021 (Northeastern University). [Paper]
Brain Score:
- CrossViT: "Joint rotational invariance and adversarial training of a dual-stream Transformer yields state of the art Brain-Score for Area V4", CVPRW, 2022 (MIT). [Paper][PyTorch]
Camera-related:
- CTRL-C: "CTRL-C: Camera calibration TRansformer with Line-Classification", ICCV, 2021 (Kakao + Kookmin University). [Paper][PyTorch]
- MS-Transformer: "Learning Multi-Scene Absolute Pose Regression with Transformers", ICCV, 2021 (Bar-Ilan University, Israel). [Paper][PyTorch]
- GTCaR: "GTCaR: Graph Transformer for Camera Re-localization", ECCV, 2022 (Magic Leap). [Paper]
Character/Text Recognition:
- BTTR: "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer", arXiv, 2021 (Peking). [Paper]
- TrOCR: "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models", arXiv, 2021 (Microsoft). [Paper][PyTorch]
- ?: "Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks", arXiv, 2021 (Salesforce). [Paper]
- T³: "TrueType Transformer: Character and Font Style Recognition in Outline Format", Document Analysis Systems (DAS), 2022 (Kyushu University). [Paper]
- ?: "Transformer-based HTR for Historical Documents", ComHum, 2022 (University of Zurich, Switzerland). [Paper]
- ?: "SVG Vector Font Generation for Chinese Characters with Transformer", ICIP, 2022 (The University of Tokyo). [Paper]
- LP-Transformer: "Forensic License Plate Recognition with Compression-Informed Transformers", ICIP, 2022 (University of Erlangen-Nurnberg, Germany). [Paper]
- CoMER: "CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition", ECCV, 2022 (Peking University). [Paper][PyTorch]
- MATRN: "Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features", ECCV, 2022 (KAIST). [Paper][PyTorch]
- CONSENT: "CONSENT: Context Sensitive Transformer for Bold Words Classification", arXiv, 2022 (Amazon). [Paper]
Curriculum Learning:
- SSTN: "Spatial Transformer Networks for Curriculum Learning", arXiv, 2021 (TU Kaiserslautern, Germany). [Paper]
Defect Classification:
- MSHViT: "Multi-Scale Hybrid Vision Transformer and Sinkhorn Tokenizer for Sewer Defect Classification", CVPRW, 2022 (Aalborg University, Denmark). [Paper]
- DefT: "Defect Transformer: An Efficient Hybrid Transformer Architecture for Surface Defect Detection", arXiv, 2022 (Nanjing University of Aeronautics and Astronautics). [Paper]
Digital Holography:
- ?: "Convolutional Neural Network (CNN) vs Visual Transformer (ViT) for Digital Holography", ICCCR, 2022 (UBFC, France). [Paper]
Disentangled representation:
- VCT: "Visual Concepts Tokenization", NeurIPS, 2022 (Microsoft). [Paper][PyTorch]
E-Commerce:
- WebShop: "WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents", NeurIPS, 2022 (Princeton). [Paper][PyTorch][Website]
Event data:
- EvT: "Event Transformer: A sparse-aware solution for efficient event data processing", arXiv, 2022 (Universidad de Zaragoza, Spain). [Paper][PyTorch]
- ETB: "Event Transformer", arXiv, 2022 (Nanjing University). [Paper]
- RVT: "Recurrent Vision Transformers for Object Detection with Event Cameras", arXiv, 2022 (University of Zurich). [Paper]
Fashion:
- Kaleido-BERT: "Kaleido-BERT: Vision-Language Pre-training on Fashion Domain", CVPR, 2021 (Alibaba). [Paper][Tensorflow]
- CIT: "Cloth Interactive Transformer for Virtual Try-On", arXiv, 2021 (University of Trento). [Paper][Code (in construction)]
- ClothFormer: "ClothFormer: Taming Video Virtual Try-on in All Module", CVPR, 2022 (iQIYI). [Paper][Website]
- FashionVLP: "FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback", CVPR, 2022 (Amazon). [Paper]
- FashionViL: "FashionViL: Fashion-Focused Vision-and-Language Representation Learning", ECCV, 2022 (University of Surrey, UK). [Paper][PyTorch]
- OutfitTransformer: "OutfitTransformer: Learning Outfit Representations for Fashion Recommendation", arXiv, 2022 (Amazon). [Paper]
- Fashionformer: "Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition", ECCV, 2022 (Peking). [Paper][PyTorch]
- MVLT: "Masked Vision-Language Transformer in Fashion", Machine Intelligence Research, 2023 (Alibaba). [Paper][PyTorch]
Feature Matching:
- SuperGlue: "SuperGlue: Learning Feature Matching with Graph Neural Networks", CVPR, 2020 (Magic Leap). [Paper][PyTorch]
- LoFTR: "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR, 2021 (Zhejiang University). [Paper][PyTorch][Website]
- COTR: "COTR: Correspondence Transformer for Matching Across Images", ICCV, 2021 (UBC). [Paper]
- CATs: "CATs: Cost Aggregation Transformers for Visual Correspondence", NeurIPS, 2021 (Yonsei University + Korea University). [Paper][PyTorch][Website]
- TransforMatcher: "TransforMatcher: Match-to-Match Attention for Semantic Correspondence", CVPR, 2022 (POSTECH). [Paper]
- ASpanFormer: "ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer", ECCV, 2022 (HKUST). [Paper][Website]
- CATs++: "CATs++: Boosting Cost Aggregation with Convolutions and Transformers", arXiv, 2022 (Korea University). [Paper]
- LoFTR-TensorRT: "Local Feature Matching with Transformers for low-end devices", arXiv, 2022 (?). [Paper][PyTorch]
- MatchFormer: "MatchFormer: Interleaving Attention in Transformers for Feature Matching", arXiv, 2022 (Karlsruhe Institute of Technology, Germany). [Paper]
- OpenGlue: "OpenGlue: Open Source Graph Neural Net Based Pipeline for Image Matching", arXiv, 2022 (Ukrainian Catholic University). [Paper][PyTorch]
Fine-grained:
- ViT-FGVC: "Exploring Vision Transformers for Fine-grained Classification", CVPRW, 2021 (Universidad de Valladolid). [Paper]
- FFVT: "Feature Fusion Vision Transformer for Fine-Grained Visual Categorization", BMVC, 2021 (Griffith University, Australia). [Paper][PyTorch]
- TPSKG: "Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition", arXiv, 2021 (Beihang University). [Paper]
- AFTrans: "A free lunch from ViT: Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition", arXiv, 2021 (Peking University). [Paper]
- TransFG: "TransFG: A Transformer Architecture for Fine-grained Recognition", AAAI, 2022 (Johns Hopkins). [Paper][PyTorch]
- DynamicMLP: "Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information", CVPR, 2022 (Megvii). [Paper][PyTorch]
- SIM-Trans: "SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization", ACMMM, 2022 (Peking University). [Paper][PyTorch]
- MetaFormer: "MetaFormer: A Unified Meta Framework for Fine-Grained Recognition", arXiv, 2022 (ByteDance). [Paper][PyTorch]
- ViT-FOD: "ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator", arXiv, 2022 (Shandong University). [Paper]
Gait:
- Gait-TR: "Spatial Transformer Network on Skeleton-based Gait Recognition", arXiv, 2022 (South China University of Technology). [Paper]
Gaze:
- GazeTR: "Gaze Estimation using Transformer", arXiv, 2021 (Beihang University). [Paper][PyTorch]
- HGTTR: "End-to-End Human-Gaze-Target Detection with Transformers", CVPR, 2022 (Shanghai Jiao Tong). [Paper]
- MGTR: "MGTR: End-to-End Mutual Gaze Detection with Transformer", ACCV, 2022 (Nankai University). [Paper][PyTorch]
- GLC: "In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation", arXiv, 2022 (Georgia Tech). [Paper][Website]
Geo-Localization:
- EgoTR: "Cross-view Geo-localization with Evolving Transformer", arXiv, 2021 (Shenzhen University). [Paper]
- TransGeo: "TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization", CVPR, 2022 (UCF). [Paper][PyTorch]
- GAMa: "GAMa: Cross-view Video Geo-localization", ECCV, 2022 (UCF). [Paper][Code (in construction)]
- TransLocator: "Where in the World is this Image? Transformer-based Geo-localization in the Wild", ECCV, 2022 (JHU). [Paper]
- TransGCNN: "Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization", arXiv, 2022 (Southeast University, China). [Paper]
- MGTL: "Mutual Generative Transformer Learning for Cross-view Geo-localization", arXiv, 2022 (University of Electronic Science and Technology of China). [Paper]
Homography Estimation:
- LocalTrans: "LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation", ICCV, 2021 (Tsinghua). [Paper]
Image Registration:
- AiR: "Attention for Image Registration (AiR): an unsupervised Transformer approach", arXiv, 2021 (INRIA). [Paper]
Image Retrieval:
- RRT: "Instance-level Image Retrieval using Reranking Transformers", ICCV, 2021 (University of Virginia). [Paper][PyTorch]
- SwinFGHash: "SwinFGHash: Fine-grained Image Retrieval via Transformer-based Hashing Network", BMVC, 2021 (Tsinghua). [Paper]
- ViT-Retrieval: "Investigating the Vision Transformer Model for Image Retrieval Tasks", arXiv, 2021 (Democritus University of Thrace). [Paper]
- IRT: "Training Vision Transformers for Image Retrieval", arXiv, 2021 (Facebook + INRIA). [Paper]
- TransHash: "TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval", arXiv, 2021 (Shanghai Jiao Tong University). [Paper]
- VTS: "Vision Transformer Hashing for Image Retrieval", arXiv, 2021 (IIIT-Allahabad). [Paper]
- GTZSR: "Zero-Shot Sketch Based Image Retrieval using Graph Transformer", arXiv, 2022 (IIT Bombay). [Paper]
- EViT: "EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing", arXiv, 2022 (Jinan University). [Paper][PyTorch (in construction)]
- ?: "Transformers and CNNs both Beat Humans on SBIR", arXiv, 2022 (University of Mons, Belgium). [Paper]
- ?: "A Light Touch Approach to Teaching Transformers Multi-view Geometry", arXiv, 2022 (Oxford). [Paper]
- DToP: "Boosting vision transformers for image retrieval", WACV, 2023 (Dealicious, Korea). [Paper][Code (in construction)]
Layout Generation:
- VTN: "Variational Transformer Networks for Layout Generation", CVPR, 2021 (Google). [Paper]
- LayoutTransformer: "LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity", CVPR, 2021 (NTU). [Paper][PyTorch]
- LayoutTransformer: "LayoutTransformer: Layout Generation and Completion with Self-attention", ICCV, 2021 (Amazon). [Paper][Website]
- LGT-Net: "LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network", CVPR, 2022 (East China Normal University). [Paper][PyTorch]
- CADTransformer: "CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings", CVPR, 2022 (UT Austin). [Paper]
- GAT-CADNet: "GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings", CVPR, 2022 (TUM + Alibaba). [Paper]
- LayoutBERT: "LayoutBERT: Masked Language Layout Model for Object Insertion", CVPRW, 2022 (Adobe). [Paper]
- ICVT: "Geometry Aligned Variational Transformer for Image-conditioned Layout Generation", ACMMM, 2022 (Alibaba). [Paper]
- BLT: "BLT: Bidirectional Layout Transformer for Controllable Layout Generation", ECCV, 2022 (Google). [Paper][Tensorflow][Website]
- ATEK: "ATEK: Augmenting Transformers with Expert Knowledge for Indoor Layout Synthesis", arXiv, 2022 (New Jersey Institute of Technology). [Paper]
- ?: "Extreme Floorplan Reconstruction by Structure-Hallucinating Transformer Cascades", arXiv, 2022 (Simon Fraser). [Paper]
- UniLayout: "UniLayout: Taming Unified Sequence-to-Sequence Transformers for Graphic Layout Generation", arXiv, 2022 (Microsoft). [Paper]
Livestock Monitoring:
- STARFormer: "Livestock Monitoring with Transformer", BMVC, 2021 (IIT Dhanbad). [Paper]
Metric Learning:
- Hyp-ViT: "Hyperbolic Vision Transformers: Combining Improvements in Metric Learning", CVPR, 2022 (University of Trento, Italy). [Paper][PyTorch]
- BGFormer: "Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach", arXiv, 2022 (Anhui University). [Paper]
Multi-Input:
- MixViT: "Adapting Multi-Input Multi-Output schemes to Vision Transformers", CVPRW, 2022 (Sorbonne Universite, France). [Paper]
Multi-label:
- C-Tran: "General Multi-label Image Classification with Transformers", CVPR, 2021 (University of Virginia). [Paper]
- TDRG: "Transformer-Based Dual Relation Graph for Multi-Label Image Recognition", ICCV, 2021 (Tencent). [Paper]
- MlTr: "MlTr: Multi-label Classification with Transformer", arXiv, 2021 (KuaiShou). [Paper]
- GATN: "Graph Attention Transformer Network for Multi-Label Image Classification", arXiv, 2022 (Southeast University, China). [Paper]
Multi-task:
- MulT: "MulT: An End-to-End Multitask Learning Transformer", CVPR, 2022 (EPFL). [Paper]
Open Set:
- OSR-ViT: "Open Set Recognition using Vision Transformer with an Additional Detection Head", arXiv, 2022 (Vanderbilt University, Tennessee). [Paper]
Out-Of-Distribution:
- OODformer: "OODformer: Out-Of-Distribution Detection Transformer", BMVC, 2021 (LMU Munich). [Paper][PyTorch]
- MCM: "Delving into Out-of-Distribution Detection with Vision-Language Representations", NeurIPS, 2022 (UW-Madison). [Paper]
Pedestrian Intention:
- IntFormer: "IntFormer: Predicting pedestrian intention with the aid of the Transformer architecture", arXiv, 2021 (Universidad de Alcala). [Paper]
Physics Simulation:
- TIE: "Transformer with Implicit Edges for Particle-based Physics Simulation", ECCV, 2022 (NTU, Singapore). [Paper][PyTorch][Website]
Place Recognition:
- SVT-Net: "SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers", AAAI, 2022 (Renmin University of China). [Paper]
- TransVPR: "TransVPR: Transformer-based place recognition with multi-level attention aggregation", CVPR, 2022 (Xi'an Jiaotong). [Paper]
- OverlapTransformer: "OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition", IROS, 2022 (HAOMO.AI, China). [Paper][PyTorch]
- SeqOT: "SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data", arXiv, 2022 (National University of Defense Technology, China). [Paper][PyTorch]
Remote Sensing/Hyperspectral/Satellite:
- DCFAM: "Transformer Meets DCFAM: A Novel Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images", arXiv, 2021 (Wuhan University). [Paper]
- WiCNet: "Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images", arXiv, 2021 (University of Trento). [Paper]
- ?: "Vision Transformers For Weeds and Crops Classification Of High Resolution UAV Images", arXiv, 2021 (University of Orleans, France). [Paper]
- Satellite-ViT: "Manipulation Detection in Satellite Images Using Vision Transformer", arXiv, 2021 (Purdue). [Paper]
- ?: "Self-supervised Vision Transformers for Joint SAR-optical Representation Learning", IGARSS, 2022 (German Aerospace Center). [Paper]
- VBFusion: "Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing", SPIE Remote Sensing, 2022 (Technische Universitat Berlin, Germany). [Paper][PyTorch]
- SatMAE: "SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery", NeurIPS, 2022 (Stanford). [Paper]
- ANDT: "Anomaly Detection in Aerial Videos with Transformers", IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2022 (TUM). [Paper]
- RNGDet: "RNGDet: Road Network Graph Detection by Transformer in Aerial Images", arXiv, 2022 (HKUST). [Paper]
- FSRA: "A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization", arXiv, 2022 (China Jiliang University). [Paper][PyTorch]
- ?: "Multiscale Convolutional Transformer with Center Mask Pretraining for Hyperspectral Imag (e Cl)assificationtion", arXiv, 2022 (Shenzhen University). [Paper]
- ?: "Deep Hyperspectral Unmixing using Transformer Network", arXiv, 2022 (Jalpaiguri Engineering College, India). [Paper]
- SiamixFormer: "SiamixFormer: A Siamese Transformer Network For Building Detection And Change Detection From Bi-Temporal Remote Sensing Images", arXiv, 2022 (Tarbiat Modares University, Iran). [Paper]
- DAHiTrA: "DAHiTrA: Damage Assessment Using a Novel Hierarchical Transformer Architecture", arXiv, 2022 (Simon Fraser University, Canada). [Paper]
- RVSA: "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model", arXiv, 2022 (Wuhan University + The University of Sydney). [Paper]
- SatViT: "Transfer Learning with Pretrained Remote Sensing Transformers", arXiv, 2022 (?). [Paper][PyTorch]
- FTN: "Fully Transformer Network for Change Detection of Remote Sensing Images", arXiv, 2022 (Dalian University of Technology). [Paper]
- MCTNet: "MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images", arXiv, 2022 (Tsinghua University). [Paper]
- ?: "Transformers For Recognition In Overhead Imagery: A Reality Check", arXiv, 2022 (Duke University). [Paper]
Robotics:
- TF-Grasp: "When Transformer Meets Robotic Grasping: Exploits Context for Efficient Grasp Detection", arXiv, 2022 (University of Science and Technology of China). [Paper][Code (in construction)]
- BeT: "Behavior Transformers: Cloning k modes with one stone", arXiv, 2022 (NYU). [Paper][PyTorch]
- Perceiver-Actor: "Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation", Conference on Robot Learning (CoRL), 2022 (NVIDIA). [Paper][Website]
- PACT: "PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training", arXiv, 2022 (Microsoft). [Paper]
- ?: "A Strong Transfer Baseline for RGB-D Fusion in Vision Transformers", arXiv, 2022 (University of Groningen, The Netherlands). [Paper]
- ?: "Grounding Language with Visual Affordances over Unstructured Data", arXiv, 2022 (University of Freiburg, Germany). [Paper][Website]
- VIMA: "VIMA: General Robot Manipulation with Multimodal Prompts", arXiv, 2022 (NVIDIA). [Paper][PyTorch][Website]
Scene Decomposition:
- SRT: "Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations", CVPR, 2022 (Google). [Paper][PyTorch (stelzner)][Website]
- OSRT: "Object Scene Representation Transformer", NeurIPS, 2022 (Google). [Paper][Website]
- Prompter: "Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following", arXiv, 2022 (Hitachi). [Paper]
Scene Text Recognition:
- ViTSTR: "Vision Transformer for Fast and Efficient Scene Text Recognition", ICDAR, 2021 (University of the Philippines). [Paper]
- STKM: "Self-attention based Text Knowledge Mining for Text Detection", CVPR, 2021 (?). [Paper][Code (in construction)]
- I2C2W: "I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition", arXiv, 2021 (NTU Singapoer). [Paper]
- CornerTransformer: "Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition", ECCV, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
- CUTE: "Contextual Text Block Detection towards Scene Text Understanding", ECCV, 2022 (NTU Singapore). [Paper][Website]
- PARSeq: "Scene Text Recognition with Permuted Autoregressive Sequence Models", ECCV, 2022 (University of the Philippines). [Paper][PyTorch]
- PTIE: "Pure Transformer with Integrated Experts for Scene Text Recognition", ECCV, 2022 (NTU Singapore). [Paper]
- MGP-STR: "Multi-Granularity Prediction for Scene Text Recognition", ECCV, 2022 (Alibaba). [Paper]
- VLAMD: "Vision-Language Adaptive Mutual Decoder for OOV-STR", ECCVW, 2022 (iFLYTEK, China). [Paper]
- MVLT: "Masked Vision-Language Transformers for Scene Text Recognition", BMVC, 2022 (Westone Information Industry Inc., China). [Paper][PyTorch]
Spike:
- Spikformer: "Spikformer: When Spiking Neural Network Meets Transformer", arXiv, 2022 (Peking). [Paper]
Stereo:
- STTR: "Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers", ICCV, 2021 (Johns Hopkins). [Paper][PyTorch]
- PS-Transformer: "PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism", BMVC, 2021 (National Institute of Informatics, JAPAN). [Paper][PyTorch]
- ChiTransformer: "ChiTransformer: Towards Reliable Stereo from Cues", CVPR, 2022 (GSU). [Paper]
- TransMVSNet: "TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers", CVPR, 2022 (Megvii). [Paper][Code (in construction)]
- MVSTER: "MVSTER: Epipolar Transformer for Efficient Multi-View Stereo", ECCV, 2022 (CAS). [Paper][PyTorch]
- CEST: "Context-Enhanced Stereo Transformer", ECCV, 2022 (CAS). [[Paper](Context-Enhanced Stereo Transformer)][PyTorch]
- WT-MVSNet: "WT-MVSNet: Window-based Transformers for Multi-view Stereo", NeurIPS, 2022 (Tsinghua University). [Paper]
- MVSFormer: "MVSFormer: Learning Robust Image Representations via Transformers and Temperature-based Depth for Multi-View Stereo", arXiv, 2022 (Fudan University). [Paper]
Time Series:
- MissFormer: "MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction", arXiv, 2021 (Fraunhofer IOSB, Germany). [Paper]
Traffic:
- NEAT: "NEAT: Neural Attention Fields for End-to-End Autonomous Driving", ICCV, 2021 (MPI). [Paper][PyTorch]
- ViTAL: "Novelty Detection and Analysis of Traffic Scenario Infrastructures in the Latent Space of a Vision Transformer-Based Triplet Autoencoder", IV, 2021 (Technische Hochschule Ingolstadt). [Paper]
- ?: "Predicting Vehicles Trajectories in Urban Scenarios with Transformer Networks and Augmented Information", IVS, 2021 (Universidad de Alcala). [Paper]
- ?: "Translating Images into Maps", ICRA, 2022 (University of Surrey, UK). [Paper][PyTorch (in construction)]
- Crossview-Transformer: "Cross-view Transformers for real-time Map-view Semantic Segmentation", CVPR, 2022 (UT Austin). [Paper][PyTorch]
- ViT-BEVSeg: "ViT-BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation", IJCNN, 2022 (Maynooth University, Ireland). [Paper][Code (in construction)]
- MSF3DDETR: "MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving", ICPRW, 2022 (University of Coimbra, Portugal). [Paper]
- TransLPC: "Transformers for Object Detection in Large Point Clouds", ITSC, 2022 (Bosch). [Paper]
- PicT: "PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification", ACMMM, 2022 (Chongqing University). [Paper][PyTorch (in construction)]
- BEVFormer: "BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers", ECCV, 2022 (Shanghai AI Lab). [Paper][PyTorch]
- JPerceiver: "JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes", ECCV, 2022 (The University of Sydney). [Paper][PyTorch]
- V2X-ViT: "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer", ECCV, 2022 (UCLA). [Paper]
- ?: "Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects?", IROSW, 2022 (Bosch). [Paper]
- MTR: "Motion Transformer with Global Intention Localization and Local Movement Refinement", NeurIPS, 2022 (MPI). [Paper][Code (in construction)]
- PlanT: "PlanT: Explainable Planning Transformers via Object-Level Representations", Conference on Robot Learning (CoRL), 2022 (TUM). [Paper][PyTorch][Website]
- BEVSegFormer: "BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs", arXiv, 2022 (Nullmax, China). [Paper]
- ParkPredict+: "ParkPredict+: Multimodal Intent and Motion Prediction for Vehicles in Parking Lots with CNN and Transformer", arXiv, 2022 (Berkeley). [Paper]
- GKT: "Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer", arXiv, 2022 (Huazhong University of Science and Technology). [Paper][Code (in construction)]
- CoBEVT: "CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers", arXiv, 2022 (UCLA). [Paper]
- ?: "Pyramid Transformer for Traffic Sign Detection", arXiv, 2022 (Iran University of Science and Technology). [Paper]
- UniFormer: "UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View", arXiv, 2022 (Zhejiang University). [Paper]
- STrajNet: "STrajNet: Occupancy Flow Prediction via Multi-modal Swin Transformer", arXiv, 2022 (NTU, Singapore). [Paper]
- MTPP: "Multi-modal Transformer Path Prediction for Autonomous Vehicle", arXiv, 2022 (National Central University). [Paper]
- MapTR: "MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction", arXiv, 2022 (Horizon Robotics). [Paper][Code (in construction)]
- DCT: "A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View", arXiv, 2022 (Gwang-ju Institute of Science and Technology). [Paper]
- C-ViT: "Traffic Accident Risk Forecasting using Contextual Vision Transformers", arXiv, 2022 (University of Technology Sydney). [Paper]
- BEVFormer-v2: "BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision", arXiv, 2022 (Tsinghua University). [Paper]
Trajectory Prediction:
- mmTransformer: "Multimodal Motion Prediction with Stacked Transformers", CVPR, 2021 (CUHK + SenseTime). [Paper][Code (in construction)][Website]
- AgentFormer: "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting", ICCV, 2021 (CMU). [Paper][PyTorch][Website]
- S2TNet: "S2TNet: Spatio-Temporal Transformer Networks for Trajectory Prediction in Autonomous Driving", ACML, 2021 (Xi'an Jiaotong University). [Paper][PyTorch]
- MRT: "Multi-Person 3D Motion Prediction with Multi-Range Transformers", NeurIPS, 2021 (UCSD + Berkeley). [Paper][PyTorch][Website]
- ?: "Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction", ICLR, 2022 (MILA). [Paper]
- Scene-Transformer: "Scene Transformer: A unified architecture for predicting multiple agent trajectories", ICLR, 2022 (Google). [Paper]
- ST-MR: "Graph-based Spatial Transformer with Memory Replay for Multi-Future Pedestrian Trajectory Prediction", CVPR, 2022 (University of New South Wales, Australia). [Paper][Tensorflow]
- HiVT: "HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction", CVPR, 2022 (CUHK). [Paper]
- EF-Transformer: "Entry-Flipped Transformer for Inference and Prediction of Participant Behavior", ECCV, 2022 (NTU, Singapore). [Paper]
- Social-SSL: "Social-SSL: Self-Supervised Cross-Sequence Representation Learning Based on Transformers for Multi-Agent Trajectory Prediction", ECCV, 2022 (NYCU). [Paper][PyTorch]
- LatentFormer: "LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction", arXiv, 2022 (Huawei). [Paper]
- PreTR: "PreTR: Spatio-Temporal Non-Autoregressive Trajectory Prediction Transformer", arXiv, 2022 (Stellantis, France). [Paper]
- Wayformer: "Wayformer: Motion Forecasting via Simple & Efficient Attention Networks", arXiv, 2022 (Waymo). [Paper]
- LaTTe: "LaTTe: Language Trajectory TransformEr", arXiv, 2022 (TUM). [Paper][Tensorflow]
- SoMoFormer: "SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion Prediction", arXiv, 2022 (Hangzhou Dianzi University). [Paper]
- ViewBirdiformer: "ViewBirdiformer: Learning to recover ground-plane crowd trajectories and ego-motion from a single ego-centric view", arXiv, 2022 (Kyoto University). [Paper]
- PedFormer: "PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention Modulation and Gated Multitask Learning", arXiv, 2022 (Huawei). [Paper]
- TAMFormer: "TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction", arXiv, 2022 (University of Padova, Italy). [Paper]
Visual Counting:
- CC-AV: "Audio-Visual Transformer Based Crowd Counting", ICCVW, 2021 (University of Kansas). [Paper]
- TransCrowd: "TransCrowd: Weakly-Supervised Crowd Counting with Transformer", arXiv, 2021 (Huazhong University of Science and Technology). [Paper][PyTorch]
- TAM-RTM: "Boosting Crowd Counting with Transformers", arXiv, 2021 (ETHZ). [Paper]
- CCTrans: "CCTrans: Simplifying and Improving Crowd Counting with Transformer", arXiv, 2021 (Meituan). [Paper]
- MAN: "Boosting Crowd Counting via Multifaceted Attention", CVPR, 2022 (Xi'an Jiaotong). [Paper][PyTorch]
- CLTR: "An End-to-End Transformer Model for Crowd Localization", ECCV, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch][Website]
- SAANet: "Scene-Adaptive Attention Network for Crowd Counting", arXiv, 2022 (Xi'an Jiaotong). [Paper]
- JCTNet: "Joint CNN and Transformer Network via weakly supervised Learning for efficient crowd counting", arXiv, 2022 (Chongqing University). [Paper]
- CrowdMLP: "CrowdMLP: Weakly-Supervised Crowd Counting via Multi-Granularity MLP", arXiv, 2022 (University of Guelph, Canada). [Paper]
- CounTR: "CounTR: Transformer-based Generalised Visual Counting", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][Website]
Visual Quality Assessment:
- TRIQ: "Transformer for Image Quality Assessment", arXiv, 2020 (NORCE). [Paper][Tensorflow-Keras]
- IQT: "Perceptual Image Quality Assessment with Transformers", CVPRW, 2021 (LG). [Paper][Code (in construction)]
- MUSIQ: "MUSIQ: Multi-scale Image Quality Transformer", ICCV, 2021 (Google). [Paper]
- TranSLA: "Saliency-Guided Transformer Network Combined With Local Embedding for No-Reference Image Quality Assessment", ICCVW, 2021 (Hikvision). [Paper]
- TReS: "No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency", WACV, 2022 (CMU). [Paper]
- IQA-Conformer: "Conformer and Blind Noisy Students for Improved Image Quality Assessment", CVPRW, 2022 (University of Wurzburg, Germany). [Paper][PyTorch]
- SwinIQA: "SwinIQA: Learned Swin Distance for Compressed Image Quality Assessment", CVPRW, 2022 (USTC, China). [Paper]
- DCVQE: "DCVQE: A Hierarchical Transformer for Video Quality Assessment", ACCV, 2022 (Weibo). [Paper]
- MCAS-IQA: "Visual Mechanisms Inspired Efficient Transformers for Image and Video Quality Assessment", arXiv, 2022 (Norwegian Research Centre, Norway). [Paper]
- MSTRIQ: "MSTRIQ: No Reference Image Quality Assessment Based on Swin Transformer with Multi-Stage Fusion", arXiv, 2022 (ByteDance). [Paper]
- DisCoVQA: "DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment", arXiv, 2022 (NTU, Singapore). [Paper]
Visual Reasoning:
- SAViR-T: "SAViR-T: Spatially Attentive Visual Reasoning with Transformers", arXiv, 2022 (Rutgers University). [Paper]
3D Human Texture Estimation:
- Texformer: "3D Human Texture Estimation from a Single Image with Transformers", ICCV, 2021 (NTU, Singapore). [Paper][PyTorch][Website]
3D Motion Synthesis:
- ACTOR: "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV, 2021 (Univ Gustave Eiffel). [Paper][PyTorch][Website]
- RTVAE: "Recurrent Transformer Variational Autoencoders for Multi-Action Motion Synthesis", CVPRW, 2022 (Amazon). [Paper]
- MotionCLIP: "MotionCLIP: Exposing Human Motion Generation to CLIP Space", ECCV, 2022 (Tel Aviv). [Paper]
- CLIP-Actor: "CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes", ECCV, 2022 (POSTECH). [Paper][PyTorch][Website]
- PoseGPT: "PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting", ECCV, 2022 (NAVER). [Paper]
- TEMOS: "TEMOS: Generating diverse human motions from textual descriptions", ECCV, 2022 (MPI). [Paper][PyTorch][Website]
- TM2T: "TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts", ECCV, 2022 (University of Alberta, Canada). [Paper][PyTorch][Website]
- HUMANISE: "HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes", NeurIPS, 2022 (Beijing Institute of Technology). [Paper][GitHub][Website]
- ActFormer: "ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation", arXiv, 2022 (SenseTime). [Paper]
- ?: "Diverse Dance Synthesis via Keyframes with Transformer Controllers", arXiv, 2022 (Beihang University). [Paper]
- MARIONET: "NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System", arXiv, 2022 (Wuhan University). [Paper]
- OhMG: "OhMG: Zero-shot Open-vocabulary Human Motion Generation", arXiv, 2022 (Sun Yat-Sen University). [Paper]
- Action-GPT: "Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation", arXiv, 2022 (IIIT Hyderabad). [Paper][Website]
- Optimus: "Transformer-Based Learned Optimization", arXiv, 2022 (Google). [Paper]
3D Object Recognition:
- MVT: "MVT: Multi-view Vision Transformer for 3D Object Recognition", BMVC, 2021 (Baidu). [Paper]
3D Reconstruction:
- PlaneTR: "PlaneTR: Structure-Guided Transformers for 3D Plane Recovery", ICCV, 2021 (Wuhan University). [Paper][PyTorch]
- CO3D: "CommonObjects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction", ICCV, 2021 (Facebook). [Paper][PyTorch]
- VolT: "Multi-view 3D Reconstruction with Transformer", ICCV, 2021 (University of British Columbia). [Paper]
- 3D-RETR: "3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers", BMVC, 2021 (ETHZ). [Paper][PyTorch]
- TransformerFusion: "TransformerFusion: Monocular RGB Scene Reconstruction using Transformers", NeurIPS, 2021 (TUM). [Paper][Website]
- LegoFormer: "LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction", arXiv, 2021 (TUM + Google). [Paper]
- PlaneFormers: "PlaneFormers: From Sparse View Planes to 3D Reconstruction", ECCV, 2022 (UMich). [Paper][PyTorch][Website]
- 3D-C2FT: "3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction", arXiv, 2022 (Korea Institute of Science and Technology). [Paper]
3D Scene:
- OpenScene: "OpenScene: 3D Scene Understanding with Open Vocabularies", arXiv, 2022 (Google). [Paper][Website]
- ?: "Language-driven Open-Vocabulary 3D Scene Understanding", arXiv, 2022 (ByteDance). [Paper]
360 Scene:
- ?: "Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-supervised Learning", AAAI, 2022 (Seoul National University). [Paper][PyTorch]
- PAVER: "Panoramic Vision Transformer for Saliency Detection in 360° Videos", ECCV, 2022 (Seoul National University). [Paper]
- PanoFormer: "PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation", ECCV, 2022 (Beijing Jiaotong University). [Paper]
- CoVisPose: "CoVisPose: Co-Visibility Pose Transformer for Wide-Baseline Relative Pose Estimation in 360° Indoor Panoramas", ECCV, 2022 (Zillow). [Paper]
- SPH: "Spherical Transformer", arXiv, 2022 (Chung-Ang University, Korea). [Paper]
Others:
- ?: "Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search", ECCV, 2022 (Intellifusion, China). [Paper]
- ?: "Strong Gravitational Lensing Parameter Estimation with Vision Transformer", ECCVW, 2022 (CMU). [Paper][PyTorch]
- Transformer-DR: "Transformer-based dimensionality reduction", arXiv, 2022 (Chongqing Normal University, China). [Paper]
- ?: "mm-Wave Radar Hand Shape Classification Using Deformable Transformers", arXiv, 2022 (Intel). [Paper]
- ?: "Fully-attentive and interpretable: vision and video vision transformers for pain detection", NeurIPSW, 2022 (Utrecht University, Netherlands). [Paper][Code (in construction)]

[Back to Overview]

Attention Mechanisms in Vision/NLP

Attention for Vision

AA: "Attention Augmented Convolutional Networks", ICCV, 2019 (Google). [Paper][PyTorch (Unofficial)][Tensorflow (Unofficial)]
LR-Net: "Local Relation Networks for Image Recognition", ICCV, 2019 (Microsoft). [Paper][PyTorch (Unofficial)]
CCNet: "CCNet: Criss-Cross Attention for Semantic Segmentation", ICCV, 2019 (& TPAMI 2020) (Horizon). [Paper][PyTorch]
GCNet: "Global Context Networks", ICCVW, 2019 (& TPAMI 2020) (Microsoft). [Paper][PyTorch]
SASA: "Stand-Alone Self-Attention in Vision Models", NeurIPS, 2019 (Google). [Paper][PyTorch-1 (Unofficial)][PyTorch-2 (Unofficial)]
- key message: attention module is more efficient than conv & provide comparable accuracy
Axial-Transformer: "Axial Attention in Multidimensional Transformers", arXiv, 2019 (Google). [Paper][PyTorch (Unofficial)]
Attention-CNN: "On the Relationship between Self-Attention and Convolutional Layers", ICLR, 2020 (EPFL). [Paper][PyTorch][Website]
SAN: "Exploring Self-attention for Image Recognition", CVPR, 2020 (CUHK + Intel). [Paper][PyTorch]
BA-Transform: "Non-Local Neural Networks With Grouped Bilinear Attentional Transforms", CVPR, 2020 (ByteDance). [Paper]
Axial-DeepLab: "Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation", ECCV, 2020 (Google). [Paper][PyTorch]
GSA: "Global Self-Attention Networks for Image Recognition", arXiv, 2020 (Google). [Paper][PyTorch (Unofficial)]
EA: "Efficient Attention: Attention with Linear Complexities", WACV, 2021 (SenseTime). [Paper][PyTorch]
LambdaNetworks: "LambdaNetworks: Modeling long-range Interactions without Attention", ICLR, 2021 (Google). [Paper][PyTorch-1 (Unofficial)][PyTorch-2 (Unofficial)]
GSA-Nets: "Group Equivariant Stand-Alone Self-Attention For Vision", ICLR, 2021 (EPFL). [Paper]
Hamburger: "Is Attention Better Than Matrix Decomposition?", ICLR, 2021 (Peking). [Paper][PyTorch (Unofficial)]
HaloNet: "Scaling Local Self-Attention For Parameter Efficient Visual Backbones", CVPR, 2021 (Google). [Paper]
BoTNet: "Bottleneck Transformers for Visual Recognition", CVPR, 2021 (Google). [Paper]
SSAN: "SSAN: Separable Self-Attention Network for Video Representation Learning", CVPR, 2021 (Microsoft). [Paper]
CoTNet: "Contextual Transformer Networks for Visual Recognition", CVPRW, 2021 (JD). [Paper][PyTorch]
Involution: "Involution: Inverting the Inherence of Convolution for Visual Recognition", CVPR, 2021 (HKUST). [Paper][PyTorch]
Perceiver: "Perceiver: General Perception with Iterative Attention", ICML, 2021 (DeepMind). [Paper][PyTorch (lucidrains)]
SNL: "Unifying Nonlocal Blocks for Neural Networks", ICCV, 2021 (Peking + Bytedance). [Paper]
External-Attention: "Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks", arXiv, 2021 (Tsinghua). [Paper]
Container: "Container: Context Aggregation Network", arXiv, 2021 (AI2). [Paper]
X-volution: "X-volution: On the unification of convolution and self-attention", arXiv, 2021 (Huawei Hisilicon). [Paper]
Invertible-Attention: "Invertible Attention", arXiv, 2021 (ANU). [Paper]
VOLO: "VOLO: Vision Outlooker for Visual Recognition", arXiv, 2021 (Sea AI Lab + NUS, Singapore). [Paper][PyTorch]
LESA: "Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms", arXiv, 2021 (Johns Hopkins). [Paper]
PS-Attention: "Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention", AAAI, 2022 (Baidu). [Paper][Paddle]
QuadTree: "QuadTree Attention for Vision Transformers", ICLR, 2022 (Simon Fraser + Alibaba). [Paper][PyTorch]
QnA: "Learned Queries for Efficient Local Attention", CVPR, 2022 (Tel-Aviv). [Paper][Jax]
?: "Fair Comparison between Efficient Attentions", CVPRW, 2022 (Kyungpook National University, Korea). [Paper][PyTorch]
KVT: "KVT: k-NN Attention for Boosting Vision Transformers", ECCV, 2022 (Alibaba). [Paper][PyTorch]
Hydra: "Hydra Attention: Efficient Attention with Many Heads", ECCVW, 2022 (Meta). [Paper]
HiP: "Hierarchical Perceiver", arXiv, 2022 (DeepMind). [Paper]
AttendNeXt: "Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers", arXiv, 2022 (University of Waterloo, Canada). [Paper]

[Back to Overview]

Attention for NLP

T-DMCA: "Generating Wikipedia by Summarizing Long Sequences", ICLR, 2018 (Google). [Paper]
LSRA: "Lite Transformer with Long-Short Range Attention", ICLR, 2020 (MIT). [Paper][PyTorch]
ETC: "ETC: Encoding Long and Structured Inputs in Transformers", EMNLP, 2020 (Google). [Paper][Tensorflow]
BlockBERT: "Blockwise Self-Attention for Long Document Understanding", EMNLP Findings, 2020 (Facebook). [Paper][GitHub]
Clustered-Attention: "Fast Transformers with Clustered Attention", NeurIPS, 2020 (Idiap). [Paper][PyTorch][Website]
BigBird: "Big Bird: Transformers for Longer Sequences", NeurIPS, 2020 (Google). [Paper][Tensorflow]
Longformer: "Longformer: The Long-Document Transformer", arXiv, 2020 (AI2). [Paper][PyTorch]
Linformer: "Linformer: Self-Attention with Linear Complexity", arXiv, 2020 (Facebook). [Paper][PyTorch (Unofficial)]
Nystromformer: "Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention", AAAI, 2021 (UW-Madison). [Paper][PyTorch]
RFA: "Random Feature Attention", ICLR, 2021 (DeepMind). [Paper]
Performer: "Rethinking Attention with Performers", ICLR, 2021 (Google). [Paper][Code][Blog]
DeLight: "DeLighT: Deep and Light-weight Transformer", ICLR, 2021 (UW). [Paper]
Synthesizer: "Synthesizer: Rethinking Self-Attention for Transformer Models", ICML, 2021 (Google). [Paper][Tensorflow][PyTorch (leaderj1001)]
Poolingformer: "Poolingformer: Long Document Modeling with Pooling Attention", ICML, 2021 (Microsoft). [Paper]
Hi-Transformer: "Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling", ACL, 2021 (Tsinghua). [Paper]
Smart-Bird: "Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer", arXiv, 2021 (Tsinghua). [Paper]
Fastformer: "Fastformer: Additive Attention is All You Need", arXiv, 2021 (Tsinghua). [Paper]
∞-former: "∞-former: Infinite Memory Transformer", arXiv, 2021 (Instituto de Telecomunicações, Portugal). [Paper]
cosFormer: "cosFormer: Rethinking Softmax In Attention", ICLR, 2022 (SenseTime). [Paper][PyTorch (davidsvy)]
MGK: "Improving Transformers with Probabilistic Attention Keys", ICML, 2022 (UCLA). [Paper]

[Back to Overview]

Attention for Both

Sparse-Transformer: "Generating Long Sequences with Sparse Transformers", arXiv, 2019 (OpenAI). [Paper][Tensorflow][Blog]
Reformer: "Reformer: The Efficient Transformer", ICLR, 2020 (Google). [Paper][Tensorflow][Blog]
Sinkhorn-Transformer: "Sparse Sinkhorn Attention", ICML, 2020 (Google). [Paper][PyTorch (Unofficial)]
Linear-Transformer: "Transformers are rnns: Fast autoregressive transformers with linear attention", ICML, 2020 (Idiap). [Paper][PyTorch][Website]
SMYRF: "SMYRF: Efficient Attention using Asymmetric Clustering", NeurIPS, 2020 (UT Austin + Google). [Paper][PyTorch]
Routing-Transformer: "Efficient Content-Based Sparse Attention with Routing Transformers", TACL, 2021 (Google). [Paper][Tensorflow][PyTorch (Unofficial)][Slides]
LRA: "Long Range Arena: A Benchmark for Efficient Transformers", ICLR, 2021 (Google). [Paper][Tensorflow]
OmniNet: "OmniNet: Omnidirectional Representations from Transformers", ICML, 2021 (Google). [Paper]
Evolving-Attention: "Evolving Attention with Residual Convolutions", ICML, 2021 (Peking + Microsoft). [Paper]
H-Transformer-1D: "H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences", ACL, 2021 (Google). [Paper]
Combiner: "Combiner: Full Attention Transformer with Sparse Computation Cost", NeurIPS, 2021 (Google). [Paper]
Centroid-Transformer: "Centroid Transformers: Learning to Abstract with Attention", arXiv, 2021 (UT Austin). [Paper]
AFT: "An Attention Free Transformer", arXiv, 2021 (Apple). [Paper]
Luna: "Luna: Linear Unified Nested Attention", arXiv, 2021 (USC + CMU + Facebook). [Paper]
Transformer-LS: "Long-Short Transformer: Efficient Transformers for Language and Vision", arXiv, 2021 (NVIDIA). [Paper]
PoNet: "PoNet: Pooling Network for Efficient Token Mixing in Long Sequences", ICLR, 2022 (Alibaba). [Paper]
Paramixer: "Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention", CVPR, 2022 (Norwegian University of Science and Technology, Norway). [Paper]
ContextPool: "Efficient Representation Learning via Adaptive Context Pooling", ICML, 2022 (Apple). [Paper]
LARA: "Linear Complexity Randomized Self-attention Mechanism", ICML, 2022 (Bytedance). [Paper]
Flowformer: "Flowformer: Linearizing Transformers with Conservation Flows", ICML, 2022 (Tsinghua University). [Paper][PyTorch]
MRA: "Multi Resolution Analysis (MRA) for Approximate Self-Attention", ICML, 2022 (University of Wisconsin, Madison). [Paper][PyTorch]
EcoFormer: "EcoFormer: Energy-Saving Attention with Linear Complexity", NeurIPS, 2022 (Monash University). [Paper][PyTorch]
SBM-Transformer: "Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost", NeurIPS, 2022 (LG). [Paper][PyTorch]
?: "Horizontal and Vertical Attention in Transformers", arXiv, 2022 (University of Technology Sydney). [Paper]
MRL: "MRL: Learning to Mix with Attention and Convolutions", arXiv, 2022 (Sony). [Paper]

[Back to Overview]

Attention for Others

Informer: "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting", AAAI, 2021 (Beihang University). [Paper][PyTorch]
Attention-Rank-Collapse: "Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth", ICML, 2021 (Google + EPFL). [Paper][PyTorch]
?: "Choose a Transformer: Fourier or Galerkin", NeurIPS, 2021 (Washington University, St. Louis). [Paper]
NPT: "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning", arXiv, 2021 (Oxford). [Paper]
FEDformer: "FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting", ICML, 2022 (Alibaba). [Paper][PyTorch]
?: "Generalizable Memory-driven Transformer for Multivariate Long Sequence Time-series Forecasting", arXiv, 2022 (University of Technology Sydney). [Paper]

[Back to Overview]

Files

README_2.md

Latest commit

History

README_2.md

File metadata and controls

(back to README.md for other categories)

Overview

Other High-level Vision Tasks

Point Cloud / 3D

Pose Estimation

Tracking

Re-ID

Face

Neural Architecture Search

Scene Graph

Transfer / X-Supervised / X-Shot / Continual Learning

Low-level Vision Tasks

Image Restoration

Video Restoration

Inpainting / Completion / Outpainting

Image Generation

Video Generation

Transfer / Translation / Manipulation

Other Low-Level Tasks

Reinforcement Learning

Navigation

Other RL Tasks

Medical

Medical Segmentation

Medical Classification

Medical Detection

Medical Reconstruction

Medical Low-Level Vision

Medical Vision-Language

Medical Others

Other Tasks

Attention Mechanisms in Vision/NLP

Attention for Vision

Attention for NLP

Attention for Both

Attention for Others