Skip to content

Latest commit

 

History

History
1476 lines (1412 loc) · 251 KB

README_2.md

File metadata and controls

1476 lines (1412 loc) · 251 KB

(back to README.md for other categories)

Overview

Other High-level Vision Tasks

Point Cloud / 3D

  • PCT: "PCT: Point Cloud Transformer", arXiv, 2020 (Tsinghua). [Paper][Jittor][PyTorch (uyzhang)]
  • Point-Transformer: "Point Transformer", arXiv, 2020 (Ulm University). [Paper]
  • NDT-Transformer: "NDT-Transformer: Large-Scale 3D Point Cloud Localisation using the Normal Distribution Transform Representation", ICRA, 2021 (University of Sheffield). [Paper][PyTorch]
  • P4Transformer: "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos", CVPR, 2021 (NUS). [Paper]
  • PTT: "PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds", IROS, 2021 (Northeastern University). [Paper][PyTorch (in construction)]
  • SnowflakeNet: "SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer", ICCV, 2021 (Tsinghua). [Paper][PyTorch]
  • PoinTr: "PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers", ICCV, 2021 (Tsinghua). [Paper][PyTorch]
  • Point-Transformer: "Point Transformer", ICCV, 2021 (Oxford + CUHK). [Paper][PyTorch (lucidrains)]
  • CT: "Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks", ICCV, 2021 (Samsung). [Paper]
  • 3DVG-Transformer: "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds", ICCV, 2021 (Beihang University). [Paper]
  • PPT-Net: "Pyramid Point Cloud Transformer for Large-Scale Place Recognition", ICCV, 2021 (Nanjing University of Science and Technology). [Paper]
  • LTTR: "3D Object Tracking with Transformer", BMVC, 2021 (Northeastern University, China). [Paper][Code (in construction)]
  • ?: "Shape registration in the time of transformers", NeurIPS, 2021 (Sapienza University of Rome). [Paper]
  • YOGO: "You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module", arXiv, 2021 (Berkeley). [Paper][PyTorch]
  • DTNet: "Dual Transformer for Point Cloud Analysis", arXiv, 2021 (Southwest University). [Paper]
  • MLMSPT: "Point Cloud Learning with Transformer", arXiv, 2021 (Southwest University). [Paper]
  • PQ-Transformer: "PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds", arXiv, 2021 (Tsinghua). [Paper][PyTorch]
  • PST2: "Spatial-Temporal Transformer for 3D Point Cloud Sequences", WACV, 2022 (Sun Yat-sen University). [Paper]
  • SCTN: "SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation", AAAI, 2022 (KAUST). [Paper]
  • AWT-Net: "Adaptive Wavelet Transformer Network for 3D Shape Representation Learning", ICLR, 2022 (NYU). [Paper]
  • ?: "Deep Point Cloud Reconstruction", ICLR, 2022 (KAIST). [Paper]
  • HiTPR: "HiTPR: Hierarchical Transformer for Place Recognition in Point Cloud", ICRA, 2022 (Nanjing University of Science and Technology). [Paper]
  • FastPointTransformer: "Fast Point Transformer", CVPR, 2022 (POSTECH). [Paper]
  • REGTR: "REGTR: End-to-end Point Cloud Correspondences with Transformers", CVPR, 2022 (NUS, Singapore). [Paper][PyTorch]
  • ShapeFormer: "ShapeFormer: Transformer-based Shape Completion via Sparse Representation", CVPR, 2022 (Shenzhen University). [Paper][Website]
  • PatchFormer: "PatchFormer: An Efficient Point Transformer with Patch Attention", CVPR, 2022 (Hangzhou Dianzi University). [Paper]
  • ?: "An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation", CVPR, 2022 (NTU + NYCU). [Paper][Code (in construction)]
  • Point-BERT: "Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling", CVPR, 2022 (Tsinghua). [Paper][PyTorch][Website]
  • PTTR: "PTTR: Relational 3D Point Cloud Object Tracking with Transformer", CVPR, 2022 (Sensetime). [Paper][PyTorch]
  • GeoTransformer: "Geometric Transformer for Fast and Robust Point Cloud Registration", CVPR, 2022 (National University of Defense Technology, China). [Paper][PyTorch]
  • PointCLIP: "PointCLIP: Point Cloud Understanding by CLIP", CVPR, 2022 (Shanghai AI Lab). [Paper][PyTorch]
  • ?: "3D Part Assembly Generation with Instance Encoded Transformer", IROS, 2022 (Tongji University). [Paper]
  • SeedFormer: "SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer", ECCV, 2022 (Tencent). [Paper][PyTorch]
  • MeshMAE: "MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis", ECCV, 2022 (JD). [Paper]
  • PPTr: "Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding", ECCV, 2022 (Tsinghua University). [Paper]
  • Geodesic-Former: "Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter", ECCV, 2022 (VinAI Research, Vietnam). [Paper]
  • LaplacianMesh-Transformer: "Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation", ECCV, 2022 (CAS). [Paper]
  • Point-MixSwap: "Point MixSwap: Attentional Point Cloud Mixing via Swapping Matched Structural Divisions", ECCV, 2022 (NYCU + NTU). [Paper][PyTorch]
  • PTT: "Real-time 3D Single Object Tracking with Transformer", TMM, 2022 (Northeastern University, China). [Paper][PyTorch]
  • Point-Transformer-V2: "Point Transformer V2: Grouped Vector Attention and Partition-based Pooling", NeurIPS, 2022 (HKU). [Paper][PyTorch (in construction)]
  • SPoVT: "SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion", NeurIPS, 2022 (NTU). [Paper]
  • GSA: "Geodesic Self-Attention for 3D Point Clouds", NeurIPS, 2022 (East China Normal University). [Paper]
  • P2P: "P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting", NeurIPS, 2022 (Tsinghua University). [Paper][PyTorch][Website]
  • 3DTRL: "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space", NeurIPS, 2022 (Stony Brook). [Paper][PyTorch][Website]
  • ShapeCrafter: "ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model", NeurIPS, 2022 (Brown). [Paper]
  • XMFnet: "Cross-modal Learning for Image-Guided Point Cloud Shape Completion", NeurIPS, 2022 (Politecnico di Torino, Italy). [Paper]
  • LighTN: "LighTN: Light-weight Transformer Network for Performance-overhead Tradeoff in Point Cloud Downsampling", arXiv, 2022 (Beijing Jiaotong University). [Paper]
  • PMP-Net++: "PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-step Point Moving Paths", arXiv, 2022 (Tsinghua). [Paper]
  • SnowflakeNet: "Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer", arXiv, 2022 (Tsinghua). [Paper][PyTorch]
  • 3DCTN: "3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification", arXiv, 2022 (University of Waterloo, Canada). [Paper]
  • VNT-Net: "VNT-Net: Rotational Invariant Vector Neuron Transformers", arXiv, 2022 (Ben-Gurion University of the Negev, Israel). [Paper]
  • CompleteDT: "CompleteDT: Point Cloud Completion with Dense Augment Inference Transformers", arXiv, 2022 (Beijing Institute of Technology). [Paper]
  • VN-Transformer: "VN-Transformer: Rotation-Equivariant Attention for Vector Neurons", arXiv, 2022 (Waymo). [Paper]
  • Voxel-MAE: "Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds", arXiv, 2022 (Chalmers University of Technology, Sweden). [Paper]
  • MAE3D: "Masked Autoencoders in 3D Point Cloud Representation Learning", arXiv, 2022 (Northwest A&F University, China). [Paper]
  • PointConvFormer: "PointConvFormer: Revenge of the Point-based Convolution", arXiv, 2022 (Apple). [Paper]
  • PTTR++: "Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer", arXiv, 2022 (NTU, Singapore). [Paper][PyTorch]
  • Pix4Point: "Pix4Point: Image Pretrained Transformers for 3D Point Cloud Understanding", arXiv, 2022 (KAUST). [Paper][Code (in construction)]
  • MVP: "Multiple View Performers for Shape Completion", arXiv, 2022 (Columbia University). [Paper]
  • Simple3D-Former: "Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?", arXiv, 2022 (UT Austin). [Paper][PyTorch]
  • 3DPCT: "3DPCT: 3D Point Cloud Transformer with Dual Self-attention", arXiv, 2022 (University of Waterloo, Canada). [Paper]
  • PS-Former: "Point Cloud Recognition with Position-to-Structure Attention Transformers", arXiv, 2022 (UCSD). [Paper]
  • LCPFormer: "LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers", arXiv, 2022 (Aberystwyth University, UK). [Paper]
  • PointCLIP-V2: "PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning", arXiv, 2022 (CUHK). [Paper][Code (in construction)]
  • R2-MLP: "R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition", arXiv, 2022 (Baidu). [Paper]
  • PVT3D: "PVT3D: Point Voxel Transformers for Place Recognition from Sparse Lidar Scans", arXiv, 2022 (TUM). [Paper]
  • PartSLIP: "PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models", arXiv, 2022 (Qualcomm). [Paper]
  • EPCL: "Frozen CLIP Model is Efficient Point Cloud Backbone", arXiv, 2022 (Shanghai AI Lab). [Paper]
  • ULIP: "ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding", arXiv, 2022 (Salesforce). [Paper][Website]

[Back to Overview]

Pose Estimation

  • Human-body:
    • HOT-Net: "HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation", ACMMM. 2020 (Kwai). [Paper]
    • TransPose: "TransPose: Towards Explainable Human Pose Estimation by Transformer", arXiv, 2020 (Southeast University). [Paper][PyTorch]
    • PTF: "Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration", CVPR, 2021 (ETHZ). [Paper][Code (in construction)][Website]
    • METRO: "End-to-End Human Pose and Mesh Reconstruction with Transformers", CVPR, 2021 (Microsoft). [Paper][PyTorch]
    • PRTR: "Pose Recognition with Cascade Transformers", CVPR, 2021 (UCSD). [Paper][PyTorch]
    • Mesh-Graphormer: "Mesh Graphormer", ICCV, 2021 (Microsoft). [Paper][PyTorch]
    • THUNDR: "THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers", ICCV, 2021 (Google). [Paper]
    • PoseFormer: "3D Human Pose Estimation with Spatial and Temporal Transformers", ICCV, 2021 (UNC). [Paper][PyTorch]
    • TransPose: "TransPose: Keypoint Localization via Transformer", ICCV, 2021 (Southeast University, China). [Paper][PyTorch]
    • POTR: "Pose Transformers (POTR): Human Motion Prediction With Non-Autoregressive Transformers", ICCVW, 2021 (Idiap). [Paper]
    • TransFusion: "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation", BMVC, 2021 (UC Irvine). [Paper][PyTorch]
    • HRT: "HRFormer: High-Resolution Transformer for Dense Prediction", NeurIPS, 2021 (CAS). [Paper][PyTorch]
    • POET: "End-to-End Trainable Multi-Instance Pose Estimation with Transformers", arXiv, 2021 (EPFL). [Paper]
    • Lifting-Transformer: "Lifting Transformer for 3D Human Pose Estimation in Video", arXiv, 2021 (Peking). [Paper]
    • TFPose: "TFPose: Direct Human Pose Estimation with Transformers", arXiv, 2021 (The University of Adelaide). [Paper][PyTorch]
    • Skeletor: "Skeletor: Skeletal Transformers for Robust Body-Pose Estimation", arXiv, 2021 (University of Surrey). [Paper]
    • HandsFormer: "HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation of Hands and Object in Interaction", arXiv, 2021 (Graz University of Technology). [Paper]
    • TTP: "Test-Time Personalization with a Transformer for Human Pose Estimation", NeurIPS, 2021 (UCSD). [Paper][PyTorch][Website]
    • GraFormer: "GraFormer: Graph Convolution Transformer for 3D Pose Estimation", arXiv, 2021 (CAS). [Paper]
    • GCT: "Geometry-Contrastive Transformer for Generalized 3D Pose Transfer", AAAI, 2022 (University of Oulu). [Paper][PyTorch]
    • MHFormer: "MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation", CVPR, 2022 (Peking). [Paper][PyTorch]
    • PAHMT: "Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation", CVPR, 2022 (NetEase). [Paper]
    • TCFormer: "Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer", CVPR, 2022 (CUHK). [Paper][PyTorch]
    • PETR: "End-to-End Multi-Person Pose Estimation With Transformers", CVPR, 2022 (Hikvision). [Paper][PyTorch]
    • GraFormer: "GraFormer: Graph-Oriented Transformer for 3D Pose Estimation", CVPR, 2022 (CAS). [Paper]
    • Keypoint-Transformer: "Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation", CVPR, 2022 (Graz University of Technology, Austria). [Paper][PyTorch][Website]
    • MPS-Net: "Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video", CVPR, 2022 (Academia Sinica). [Paper][Website]
    • Ego-STAN: "Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation", CVPRW, 2022 (University of Waterloo, Canada). [Paper]
    • AggPose: "AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation", IJCAI, 2022 (Shenzhen Baoan Women’s and Childiren’s Hospital). [Paper][Code (in construction)]
    • MotionMixer: "MotionMixer: MLP-based 3D Human Body Pose Forecasting", IJCAI, 2022 (Ulm University, Germany). [Paper][Code (in construction)]
    • Jointformer: "Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation", ICPR, 2022 (Trinity College Dublin, Ireland). [Paper]
    • IVT: "IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation", ACMMM, 2022 (Baidu). [Paper]
    • FastMETRO: "Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers", ECCV, 2022 (POSTECH). [Paper][PyTorch][Website]
    • PPT: "PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation", ECCV, 2022 (UC Irvine). [Paper][PyTorch]
    • Poseur: "Poseur: Direct Human Pose Regression with Transformers", ECCV, 2022 (The University of Adelaide, Australia). [Paper]
    • ViTPose: "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation", NeurIPS, 2022 (The University of Sydney). [Paper][PyTorch]
    • Swin-Pose: "Swin-Pose: Swin Transformer Based Human Pose Estimation", arXiv, 2022 (UMass Lowell) [Paper]
    • HeadPosr: "HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders", arXiv, 2022 (ETHZ). [Paper]
    • CrossFormer: "CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation", arXiv, 2022 (Canberra University, Australia). [Paper]
    • VTP: "VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation", arXiv, 2022 (Hangzhou Dianzi University). [Paper]
    • HeatER: "HeatER: An Efficient and Unified Network for Human Reconstruction via Heatmap-based TransformER", arXiv, 2022 (UCF). [Paper]
    • GraphMLP: "GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation", arXiv, 2022 (Peking University). [Paper]
    • siMLPe: "Back to MLP: A Simple Baseline for Human Motion Prediction", arXiv, 2022 (INRIA). [Paper][Pytorch]
    • Snipper: "Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet", arXiv, 2022 (University of Alberta, Canada). [Paper][PyTorch]
    • OTPose: "OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos", arXiv, 2022 (Korea University). [Paper]
    • PoseBERT: "PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling", arXiv, 2022 (NAVER). [Paper][PyTorch]
    • KOG-Transformer: "K-Order Graph-oriented Transformer with GraAttention for 3D Pose and Shape Estimation", arXiv, 2022 (CAS). [Paper]
    • SoMoFormer: "SoMoFormer: Multi-Person Pose Forecasting with Transformers", arXiv, 2022 (Stanford). [Paper]
    • DPIT: "DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation", arXiv, 2022 (Shanghai University). [Paper]
    • Uplift-Upsample: "Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers", WACV, 2023 (University of Augsburg, Germany). [Paper][Tensorflow]
    • TORE: "TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer", arXiv, 2022 (HKU). [Paper]
    • MPT: "MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction", arXiv, 2022 (Microsoft). [Paper]
    • ViTPose+: "ViTPose+: Vision Transformer Foundation Model for Generic Body Pose Estimation", arXiv, 2022 (The University of Sydney). [Paper][PyTorch]
  • Hands:
    • Hand-Transformer: "Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation", ECCV, 2020 (Kwai). [Paper]
    • SCAT: "SCAT: Stride Consistency With Auto-Regressive Regressor and Transformer for Hand Pose Estimation", ICCVW, 2021 (Alibaba). [Paper]
    • SeTHPose: "Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation", arXiv, 2022 (Queen's University, Canada). [Paper]
    • HTT: "Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos", arXiv, 2022 (HKU). [Paper]
    • ?: "Image-free Domain Generalization via CLIP for 3D Hand Pose Estimation", arXiv, 2022 (UNIST, Korea). [Paper]
  • Others:
    • TAPE: "Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry", arXiv, 2020 (Tianjing University). [Paper]
    • T6D-Direct: "T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression", GCPR, 2021 (University of Bonn). [Paper]
    • 6D-ViT: "6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning", arXiv, 2021 (University of Science and Technology of China). [Paper]
    • RayTran: "RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers", ECCV, 2022 (Google). [Paper]
    • DProST: "DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation", ECCV, 2022 (Seoul National University). [Paper][PyTorch]
    • AFT-VO: "AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation", arXiv, 2022 (University of Surrey, UK). [Paper]
    • DPT-VO: "Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry", arXiv, 2022 (Aeronautics Institute of Technology, Brazil). [Paper]
    • ?: "Video based Object 6D Pose Estimation using Transformers", arXiv, 2022 (Georgia Tech). [Paper][PyTorch]
    • PoET: "PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation", arXiv, 2022 (Infineon Technologies Austria AG). [Paper][PyTorch]
    • CRT-6D: "CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers", WACV, 2023 (ICL, UK**). [Paper][Code (in construction)]

[Back to Overview]

Tracking

  • General:
    • TransTrack: "TransTrack: Multiple-Object Tracking with Transformer",arXiv, 2020 (HKU + ByteDance) . [Paper][PyTorch]
    • TransformerTrack: "Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking", CVPR, 2021 (USTC). [Paper][PyTorch]
    • TransT: "Transformer Tracking", CVPR, 2021 (Dalian University of Technology). [Paper][PyTorch]
    • STARK: "Learning Spatio-Temporal Transformer for Visual Tracking", ICCV, 2021 (Microsoft). [Paper][PyTorch]
    • HiFT: "HiFT: Hierarchical Feature Transformer for Aerial Tracking", ICCV, 2021 (Tongji University). [Paper][PyTorch]
    • DTT: "High-Performance Discriminative Tracking With Transformers", ICCV, 2021 (CAS). [Paper]
    • DualTFR: "Learning Tracking Representations via Dual-Branch Fully Transformer Networks", ICCVW, 2021 (Microsoft). [Paper][PyTorch (in construction)]
    • TransCenter: "TransCenter: Transformers with Dense Queries for Multiple-Object Tracking", arXiv, 2021 (INRIA + MIT). [Paper]
    • TransMOT: "TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking", arXiv, 2021 (Microsoft). [Paper]
    • TREG: "Target Transformed Regression for Accurate Tracking", arXiv, 2021 (Nanjing University). [Paper][Code (in construction)]
    • TrTr: "TrTr: Visual Tracking with Transformer", arXiv, 2021 (University of Tokyo). [Paper][PyTorch]
    • RelationTrack: "RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation", arXiv, 2021 (Huazhong Univerisity of Science and Technology). [Paper]
    • SiamTPN: "Siamese Transformer Pyramid Networks for Real-Time UAV Tracking", WACV, 2022 (New York University). [Paper]
    • MixFormer: "MixFormer: End-to-End Tracking with Iterative Mixed Attention", CVPR, 2022 (Nanjing University). [Paper][PyTorch]
    • ToMP: "Transforming Model Prediction for Tracking", CVPR, 2022 (ETHZ). [Paper][PyTorch]
    • GTR: "Global Tracking Transformers", CVPR, 2022 (UT Austin). [Paper][PyTorch]
    • UTT: "Unified Transformer Tracker for Object Tracking", CVPR, 2022 (Meta). [Paper][Code (in construction)]
    • MeMOT: "MeMOT: Multi-Object Tracking with Memory", CVPR, 2022 (Amazon). [Paper]
    • CSwinTT: "Transformer Tracking with Cyclic Shifting Window Attention", CVPR, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
    • STNet: "Spiking Transformers for Event-Based Single Object Tracking", CVPR, 2022 (Dalian University of Technology). [Paper]
    • TrackFormer: "TrackFormer: Multi-Object Tracking with Transformers", CVPR, 2022 (Facebook). [Paper][PyTorch]
    • SparseTT: "SparseTT: Visual Tracking with Sparse Transformers", IJCAI, 2022 (Beihang University). [Paper][Code (in construction)]
    • AiATrack: "AiATrack: Attention in Attention for Transformer Visual Tracking", ECCV, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
    • MOTR: "MOTR: End-to-End Multiple-Object Tracking with TRansformer", ECCV, 2022 (Megvii). [Paper][PyTorch]
    • SwinTrack: "SwinTrack: A Simple and Strong Baseline for Transformer Tracking", NeurIPS, 2022 (South China University of Technology). [Paper][PyTorch]
    • ModaMixer: "Divert More Attention to Vision-Language Tracking", NeurIPS, 2022 (Beijing Jiaotong University). [Paper][PyTorch]
    • TransMOT: "Transformers for Multi-Object Tracking on Point Clouds", IV, 2022 (Bosch). [Paper]
    • TransT-M: "High-Performance Transformer Tracking", arXiv, 2022 (Dalian University of Technology). [Paper]
    • HCAT: "Efficient Visual Tracking via Hierarchical Cross-Attention Transformer", arXiv, 2022 (Dalian University of Technology). [Paper]
    • ?: "Keypoints Tracking via Transformer Networks", arXiv, 2022 (KAIST). [Paper][PyTorch]
    • TranSTAM: "Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking", arXiv, 2022 (Tsinghua University). [Paper][PyTorch]
    • TransFiner: "TransFiner: A Full-Scale Refinement Approach for Multiple Object Tracking", arXiv, 2022 (China University of Geosciences). [Paper]
    • LPAT: "Local Perception-Aware Transformer for Aerial Tracking", arXiv, 2022 (Tongji University). [Paper][PyTorch]
    • TADN: "Transformer-based assignment decision network for multiple object tracking", arXiv, 2022 (National Technical University of Athens, Greece). [Paper][Code (in construction)]
    • Strong-TransCenter: "Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations", arXiv, 2022 (Tel-Aviv University). [Paper][PyTorch]
    • MQT: "End-to-end Tracking with a Multi-query Transformer", arXiv, 2022 (Oxford). [Paper]
    • ProContEXT: "ProContEXT: Exploring Progressive Context Transformer for Tracking", arXiv, 2022 (Alibaba). [Paper]
    • ?: "Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer", arXiv, 2022 (Sony). [Paper]
    • MOTRv2: "MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors", arXiv, 2022 (Megvii). [Paper][Pytorch]
  • 3D:
    • STNet: "3D Siamese Transformer Network for Single Object Tracking on Point Clouds", ECCV, 2022 (Nanjing University of Science and Technology). [Paper][PyTorch]
    • CMT: "CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds", ECCV, 2022 (USTC). [Paper]
    • InterTrack: "InterTrack: Interaction Transformer for 3D Multi-Object Tracking", arXiv, 2022 (University of Toronto). [Paper]
    • GLT-T: "GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds", AAAI, 2023 (Hangzhou Dianzi University). [Paper]

[Back to Overview]

Re-ID

  • PAT: "Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer", CVPR, 2021 (University of Science and Technology of China). [Paper]
  • HAT: "HAT: Hierarchical Aggregation Transformers for Person Re-identification", ACMMM, 2021 (Dalian University of Technology). [Paper]
  • TransReID: "TransReID: Transformer-based Object Re-Identification", ICCV, 2021 (Alibaba). [Paper][PyTorch]
  • APD: "Transformer Meets Part Model: Adaptive Part Division for Person Re-Identification", ICCVW, 2021 (Meituan). [Paper]
  • Pirt: "Pose-guided Inter- and Intra-part Relational Transformer for Occluded Person Re-Identification", ACMMM, 2021 (Beihang University). [Paper]
  • TransMatcher: "Transformer-Based Deep Image Matching for Generalizable Person Re-identification", NeurIPS, 2021 (IIAI). [Paper][PyTorch]
  • STT: "Spatiotemporal Transformer for Video-based Person Re-identification", arXiv, 2021 (Beihang University). [Paper]
  • AAformer: "AAformer: Auto-Aligned Transformer for Person Re-Identification", arXiv, 2021 (CAS). [Paper]
  • TMT: "A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification", arXiv, 2021 (Dalian University of Technology). [Paper]
  • LA-Transformer: "Person Re-Identification with a Locally Aware Transformer", arXiv, 2021 (University of Maryland Baltimore County). [Paper]
  • DRL-Net: "Learning Disentangled Representation Implicitly via Transformer for Occluded Person Re-Identification", arXiv, 2021 (Peking University). [Paper]
  • GiT: "GiT: Graph Interactive Transformer for Vehicle Re-identification", arXiv, 2021 (Huaqiao University). [Paper]
  • OH-Former: "OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification", arXiv, 2021 (Shanghaitech University). [Paper]
  • CMTR: "CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification", arXiv, 2021 (Beijing Jiaotong University). [Paper]
  • PFD: "Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer", AAAI, 2022 (Peking). [Paper][PyTorch]
  • NFormer: "NFormer: Robust Person Re-identification with Neighbor Transformer", CVPR, 2022 (University of Amsterdam, Netherlands). [Paper][Code (in construction)]
  • DCAL: "Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification", CVPR, 2022 (Advanced Micro Devices, China). [Paper]
  • CMT: " Cross-Modality Transformer for Visible-Infrared Person Re-identification", ECCV, 2022 (USTC). [Paper]
  • CAViT: "CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification", ECCV, 2022 (CAS). [Paper][PyTorch]
  • PiT: "Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval", IEEE Transactions on Industrial Informatics, 2022 (* Peking*). [Paper]
  • ?: "Motion-Aware Transformer For Occluded Person Re-identification", arXiv, 2022 (NetEase, China). [Paper]
  • PFT: "Short Range Correlation Transformer for Occluded Person Re-Identification", arXiv, 2022 (Nanjing University of Posts and Telecommunications). [Paper]
  • ?: "CLIP-Driven Fine-grained Text-Image Person Re-identification", arXiv, 2022 (Nanjing University of Science and Technology). [Paper]
  • SeqTR: "Sequential Transformer for End-to-End Person Search", arXiv, 2022 (East China Normal University). [Paper]
  • CLIP-ReID: "CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels", arXiv, 2022 (East China Normal University). [Paper]
  • TMGF: "Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification", WACVW, 2023 (Zhejiang University). [Paper][Code (in construction)]
  • PMT: "Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification", AAAI, 2023 (Jiangsu University). [Paper][Code (in construction)]

[Back to Overview]

Face

  • General:
    • FAU-Transformer: "Facial Action Unit Detection With Transformers", CVPR, 2021 (Rakuten Institute of Technology). [Paper]
    • TADeT: "Mitigating Bias in Visual Transformers via Targeted Alignment", BMVC, 2021 (Gerogia Tech). [Paper]
    • ViT-Face: "Face Transformer for Recognition", arXiv, 2021 (Beijing University of Posts and Telecommunications). [Paper]
    • FaceT: "Learning to Cluster Faces via Transformer", arXiv, 2021 (Alibaba). [Paper]
    • VidFace: "VidFace: A Full-Transformer Solver for Video Face Hallucination with Unaligned Tiny Snapshots", arXiv, 2021 (Zhejiang University). [Paper]
    • FAA: "Shuffle Transformer with Feature Alignment for Video Face Parsing", arXiv, 2021 (Tencent). [Paper]
    • FaRL: "General Facial Representation Learning in a Visual-Linguistic Manner", CVPR, 2022 (Microsoft). [Paper][PyTorch]
    • FaceFormer: "FaceFormer: Speech-Driven 3D Facial Animation with Transformers", CVPR, 2022 (HKU). [Paper][PyTorch][Website]
    • PhysFormer: "PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer", CVPR, 2022 (University of Oulu, Finland). [Paper][PyTorch]
    • VTP: "Sub-word Level Lip Reading With Visual Attention", CVPR, 2022 (Oxford). [Paper]
    • Label2Label: "Label2Label: A Language Modeling Framework for Multi-Attribute Learning", ECCV, 2022 (Tsinghua). [Paper][PyTorch]
    • FPVT: "Face Pyramid Vision Transformer", BMVC, 2022 (FloppyDisk.AI, Pakistan). [Paper][PyTorch][Website]
    • fViT: "Part-based Face Recognition with Vision Transformers", BMVC, 2022 (Queen Mary University of London). [Paper]
    • EventFormer: "EventFormer: AU Event Transformer for Facial Action Unit Event Detection", arXiv, 2022 (Peking). [Paper]
    • MFT: "Multi-Modal Learning for AU Detection Based on Multi-Head Fused Transformers", arXiv, 2022 (SUNY Binghamton). [Paper]
    • VC-TRSF: "Self-supervised Video-centralised Transformer for Video Face Clustering", arXiv, 2022 (ICL). [Paper]
  • Facial Landmark:
    • Clusformer: "Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition", CVPR, 2021 (VinAI Research, Vietnam). [Paper]
    • LOTR: "LOTR: Face Landmark Localization Using Localization Transformer", arXiv, 2021 (Sertis, Thailand). [Paper]
    • SLPT: "Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning", CVPR, 2022 (University of Technology Sydney). [Paper][PyTorch]
    • DTLD: "Towards Accurate Facial Landmark Detection via Cascaded Transformers", CVPR, 2022 (Samsung). [Paper]
    • RePFormer: "RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark Detection", arXiv, 2022 (CUHK). [Paper]
  • Face Low-Level Vision:
    • Latent-Transformer: "A Latent Transformer for Disentangled Face Editing in Images and Videos", ICCV, 2021 (Institut Polytechnique de Paris). [Paper][PyTorch]
    • TANet: "TANet: A new Paradigm for Global Face Super-resolution via Transformer-CNN Aggregation Network", arXiv, 2021 (Wuhan Institute of Technology). [Paper]
    • FAT: "Facial Attribute Transformers for Precise and Robust Makeup Transfer", WACV, 2022 (University of Rochester). [Paper]
    • SSAT: "SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal", AAAI, 2022 (Wuhan University). [Paper][PyTorch]
    • TransEditor: "TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing", CVPR, 2022 (Shanghai AI Lab). [Paper][PyTorch][Website]
    • RestoreFormer: "RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs", CVPR, 2022 (HKU). [Paper]
    • HairCLIP: "HairCLIP: Design Your Hair by Text and Reference Image", CVPR, 2022 (USTC). [Paper][PyTorch]
    • AnyFace: "AnyFace: Free-style Text-to-Face Synthesis and Manipulation", CVPR, 2022 (CAS). [Paper]
    • CodeFormer: "Towards Robust Blind Face Restoration with Codebook Lookup Transformer", NeurIPS, 2022 (NTU, Singapore). [Paper][PyTorch (in construction)][Website]
    • Cycle-Text2Face: "Cycle Text2Face: Cycle Text-to-face GAN via Transformers", arXiv, 2022 (Shahed Univerisity, Iran). [Paper]
    • FaceFormer: "FaceFormer: Scale-aware Blind Face Restoration with Transformers", arXiv, 2022 (Tencent). [Paper]
    • text2StyleGAN: "Text-Free Learning of a Natural Language Interface for Pretrained Face Generators", arXiv, 2022 (Toyota Technological Institute, Chicago). [Paper][PyTorch]
    • ManiCLIP: "ManiCLIP: Multi-Attribute Face Manipulation from Text", arXiv, 2022 (NTU, Singapore). [Paper]
    • FEAT: "FEAT: Face Editing with Attention", arXiv, 2022 (Shenzhen University). [Paper]
  • Facial Expression:
    • TransFER: "TransFER: Learning Relation-aware Facial Expression Representations with Transformers", ICCV, 2021 (CAS). [Paper]
    • CVT-Face: "Robust Facial Expression Recognition with Convolutional Visual Transformers", arXiv, 2021 (Hunan University). [Paper]
    • MViT: "MViT: Mask Vision Transformer for Facial Expression Recognition in the wild", arXiv, 2021 (University of Science and Technology of China). [Paper]
    • ViT-SE: "Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition", arXiv, 2021 (CentraleSupélec, France). [Paper]
    • EST: "Expression Snippet Transformer for Robust Video-based Facial Expression Recognition", arXiv, 2021 (China University of Geosciences). [Paper][PyTorch]
    • MFEViT: "MFEViT: A Robust Lightweight Transformer-based Network for Multimodal 2D+3D Facial Expression Recognition", arXiv, 2021 (University of Science and Technology of China). [Paper]
    • F-PDLS: "Vision Transformer Equipped with Neural Resizer on Facial Expression Recognition Task", ICASSP, 2022 (KAIST). [Paper]
    • ?: "Transformer-based Multimodal Information Fusion for Facial Expression Analysis", arXiv, 2022 (Netease, China). [Paper]
    • ?: "Facial Expression Recognition with Swin Transformer", arXiv, 2022 (Dongguk University, Korea). [Paper]
    • POSTER: "POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition", arXiv, 2022 (UCF). [Paper]
    • STT: "Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild", arXiv, 2022 (*Hunan University *). [Paper]
    • FaceMAE: "FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders", arXiv, 2022 (NUS). [Paper][Code (in construction)]
    • TransFA: "TransFA: Transformer-based Representation for Face Attribute Evaluation", arXiv, 2022 (Xidian University). [Paper]
    • AU-CVT: "AU-Supervised Convolutional Vision Transformers for Synthetic Facial Expression Recognition", arXiv, 2022 (Shenzhen Technology University). [Paper][PyTorch]
    • ?: "Multi-Task Transformer with uncertainty modelling for Face Based Affective Computing", arXiv, 2022 (Datakalab, France). [Paper]
    • APViT: "Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition", arXiv, 2022 (Baidu). [Paper]
  • Attack-related:
    • ?: "Video Transformer for Deepfake Detection with Incremental Learning", ACMMM, 2021 (MBZUAI). [Paper]
    • ViTranZFAS: "On the Effectiveness of Vision Transformers for Zero-shot Face Anti-Spoofing", International Joint Conference on Biometrics (IJCB), 2021 (Idiap). [Paper]
    • MTSS: "Multi-Teacher Single-Student Visual Transformer with Multi-Level Attention for Face Spoofing Detection", BMVC, 2021 (National Taiwan Ocean University). [Paper]
    • TransRPPG: "TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection", arXiv, 2021 (University of Oulu). [Paper]
    • CViT: "Deepfake Video Detection Using Convolutional Vision Transformer", arXiv, 2021 (Jimma University). [Paper]
    • ViT-Distill: "Deepfake Detection Scheme Based on Vision Transformer and Distillation", arXiv, 2021 (Sookmyung Women’s University). [Paper]
    • M2TR: "M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection", arXiv, 2021 (Fudan University). [Paper]
    • Cross-ViT: "Combining EfficientNet and Vision Transformers for Video Deepfake Detection", arXiv, 2021 (University of Pisa). [Paper][PyTorch]
    • ICT: "Protecting Celebrities from DeepFake with Identity Consistency Transformer", CVPR, 2022 (Microsoft). [Paper][PyTorch]
    • GGViT: "GGViT: Multistream Vision Transformer Network in Face2Face Facial Reenactment Detection", ICPR, 2022 (CAS). [Paper]
    • ?: "Hybrid Transformer Network for Deepfake Detection", International Conference on Content-Based Multimedia Indexing (CBMI), 2022 (MediaFutures, Norway). [Paper]
    • ViTAF: "Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing", ECCV, 2022 (Google). [Paper]
    • UIA-ViT: "UIA-ViT: Unsupervised Inconsistency-Aware Method Based on Vision Transformer for Face Forgery Detection", ECCV, 2022 (USTC). [Paper]
    • ?: "Multi-Scale Wavelet Transformer for Face Forgery Detection", ACCV, 2022 (Hikvision). [Paper]
    • ?: "Self-supervised Transformer for Deepfake Detection", arXiv, 2022 (USTC, China). [Paper]
    • ViTransPAD: "ViTransPAD: Video Transformer using convolution and self-attention for Face Presentation Attack Detection", arXiv, 2022 (University of La Rochelle, France). [Paper]
    • ?: "Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection", arXiv, 2022 (National Research Council, Italy). [Paper]
    • STDT: "Deepfake Video Detection with Spatiotemporal Dropout Transformer", arXiv, 2022 (CAS). [Paper]
    • ?: "Deep Convolutional Pooling Transformer for Deepfake Detection", arXiv, 2022 (HKU). [Paper]

[Back to Overview]

Neural Architecture Search

  • HR-NAS: "HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers", CVPR, 2021 (HKU). [Paper][PyTorch]
  • CATE: "CATE: Computation-aware Neural Architecture Encoding with Transformers", ICML, 2021 (Michigan State). [Paper]
  • AutoFormer: "AutoFormer: Searching Transformers for Visual Recognition", ICCV, 2021 (Microsoft). [Paper][PyTorch]
  • GLiT: "GLiT: Neural Architecture Search for Global and Local Image Transformer", ICCV, 2021 (The University of Sydney + SenseTime). [Paper]
  • BossNAS: "BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search", ICCV, 2021 (Monash University). [Paper][PyTorch]
  • ViT-ResNAS: "Searching for Efficient Multi-Stage Vision Transformers", ICCVW, 2021 (MIT). [Paper][PyTorch]
  • AutoformerV2: "Searching the Search Space of Vision Transformer", NeurIPS, 2021 (Microsoft). [Paper][PyTorch]
  • TNASP: "TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework", NeurIPS, 2021 (CAS + Kuaishou). [Paper]
  • PSViT: "PSViT: Better Vision Transformer via Token Pooling and Attention Sharing", arXiv, 2021 (The University of Sydney + SenseTime). [Paper]
  • As-ViT: "Auto-scaling Vision Transformers without Training", ICLR, 2022 (UT Austin). [Paper][PyTorch]
  • NASViT: "NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training", ICLR, 2022 (Facebook). [Paper]
  • TF-TAS: "Training-free Transformer Architecture Search", CVPR, 2022 (Tencent). [Paper]
  • ViT-Slim: "Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space", CVPR, 2022 (MBZUAI). [Paper][PyTorch]
  • BurgerFormer: "Searching for BurgerFormer with Micro-Meso-Macro Space Design", ICML, 2022 (CAS). [Paper][Code (in construction)]
  • UniNet: "UniNet: Unified Architecture Search with Convolution, Transformer, and MLP", ECCV, 2022 (CUHK + SenseTime). [Paper]
  • ViTAS: "Vision Transformer Architecture Search", ECCV, 2022 (The University of Sydney + SenseTime). [Paper]
  • VTCAS: "Vision Transformer with Convolutions Architecture Search", arXiv, 2022 (Donghua University). [Paper]
  • NOAH: "Neural Prompt Search", arXiv, 2022 (NTU, Singapore). [Paper][PyTorch]
  • FocusFormer: "FocusFormer: Focusing on What We Need via Architecture Sampler", arXiv, 2022 (Monash University, Australia). [Paper]
  • NAR-Former: "NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction", arXiv, 2022 (Xidian University, China). [Paper]

[Back to Overview]

Scene Graph

  • BGT-Net: "BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation", CVPRW, 2021 (ETHZ). [Paper]
  • STTran: "Spatial-Temporal Transformer for Dynamic Scene Graph Generation", ICCV, 2021 (Leibniz University Hannover, Germany). [Paper][PyTorch]
  • SGG-NLS: "Learning to Generate Scene Graph from Natural Language Supervision", ICCV, 2021 (University of Wisconsin-Madison). [Paper][PyTorch]
  • SGG-Seq2Seq: "Context-Aware Scene Graph Generation With Seq2Seq Transformers", ICCV, 2021 (Layer 6 AI, Canada). [Paper][PyTorch]
  • RELAX: "Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs", BMVC, 2021 (Samsung). [Paper]
  • Relation-Transformer: "Scenes and Surroundings: Scene Graph Generation using Relation Transformer", arXiv, 2021 (LMU Munich). [Paper]
  • SGTR: "SGTR: End-to-end Scene Graph Generation with Transformer", CVPR, 2022 (ShanghaiTech). [Paper][Code (in construction)]
  • GCL: "Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation", CVPR, 2022 (Shandong University). [Paper][PyTorch]
  • Relationformer: "Relationformer: A Unified Framework for Image-to-Graph Generation", ECCV, 2022 (TUM). [Paper][Code (in construction)]
  • SVRP: "Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning", ECCV, 2022 (Monash University). [Paper]
  • RelTR: "RelTR: Relation Transformer for Scene Graph Generation", arXiv, 2022 (Leibniz University Hannover, Germany). [Paper][PyTorch]
  • SG-Shuffle: "SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation", arXiv, 2022 (The University of Sydney). [Paper]
  • IS-GGT: "Iterative Scene Graph Generation with Generative Transformers", arXiv, 2022 (Oklahoma State University). [Paper]

[Back to Overview]

Transfer / X-Supervised / X-Shot / Continual Learning

  • Transfer Learning:
    • AdaptFormer: "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition", NeurIPS, 2022 (HKU). [Paper][PyTorch][Website]
    • Convpass: "Convolutional Bypasses Are Better Vision Transformer Adapters", arXiv, 2022 (Peking University). [Paper][Pytorch]
    • FacT: "FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer", AAAI, 2023 (Peking). [Paper][Pytorch]
  • Domain Adaptation/Generalization:
    • TransDA: "Transformer-Based Source-Free Domain Adaptation", arXiv, 2021 (Haerbin Institute of Technology). [Paper][PyTorch]
    • TVT: "TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation", arXiv, 2021 (UT Arlington + Kuaishou). [Paper]
    • ResTran: "Discovering Spatial Relationships by Transformers for Domain Generalization", arXiv, 2021 (MBZUAI). [Paper]
    • WinTR: "Exploiting Both Domain-specific and Invariant Knowledge via a Win-win Transformer for Unsupervised Domain Adaptation", arXiv, 2021 (Beijing Institute of Technology). [Paper]
    • CDTrans: "CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation", ICLR, 2022 (Alibaba). [Paper][PyTorch]
    • SSRT: "Safe Self-Refinement for Transformer-based Domain Adaptation", CVPR, 2022 (Stony Brook). [Paper]
    • DOT: "Making the Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation", ACMMM, 2022 (Beijing Institute of Technology). [Paper]
    • GVRT: "Grounding Visual Representations with Texts for Domain Generalization", ECCV, 2022 (LG). [Paper][PyTorch]
    • PACMAC: "Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency", NeurIPS, 2022 (Georgia Tech). [Paper][PyTorch]
    • BCAT: "Domain Adaptation via Bidirectional Cross-Attention Transformer", arXiv, 2022 (Southern University of Science and Technology). [Paper]
    • DoTNet: "Towards Unsupervised Domain Adaptation via Domain-Transformer", arXiv, 2022 (Sun Yat-Sen University). [Paper]
    • TransDA: "Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation", arXiv, 2022 (Tsinghua). [Paper][Code (in construction)]
    • FAMLP: "FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization", arXiv, 2022 (University of Science and Technology of China). [Paper]
    • ERM-ViT: "Self-Distilled Vision Transformer for Domain Generalization", arXiv, 2022 (MBZUAI). [Paper][PyTorch]
    • MPA: "Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation", arXiv, 2022 (Fudan University). [Paper]
    • DePT: "Visual Prompt Tuning for Test-time Domain Adaptation", arXiv, 2022 (Amazon). [Paper]
    • LADS: "Using Language to Extend to Unseen Domains", arXiv, 2022 (Berkeley). [Paper]
    • FedAPT: "Cross-domain Federated Adaptive Prompt Tuning for CLIP", arXiv, 2022 (Fudan University). [Paper]
    • MetaPrompt: "Learning Domain Invariant Prompt for Vision-Language Models", arXiv, 2022 (Tongji University + Microsoft). [Paper]
  • X-Supervised:
    • Semiformer: "Semi-Supervised Vision Transformers", ECCV, 2022 (Fudan University). [Paper][PyTorch]
    • SVL-Adapter: "SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models", BMVC, 2022 (UCL). [Paper][Code (in construction)]
    • Semi-ViT: "Semi-supervised Vision Transformers at Scale", NeurIPS, 2022 (Amazon). [Paper]
  • Zero-Shot:
    • ViT-ZSL: "Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning", IMVIP, 2021 (University of Exeter, UK). [Paper]
    • TransZero: "TransZero: Attribute-guided Transformer for Zero-Shot Learning", AAAI, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
    • ?: "Zero-shot Visual Commonsense Immorality Prediction", BMVC, 2022 (Korea University). [Paper][PyTorch]
    • TPT: "Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models", NeurIPS, 2022 (NVIDIA). [Paper][PyTorch][Website]
    • I2DFormer: "I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification", NeurIPS, 2022 (ETHZ). [Paper]
    • HRT: "Hybrid Routing Transformer for Zero-Shot Learning", arXiv, 2022 (Xidian University). [Paper]
    • MUST: "Masked Unsupervised Self-training for Zero-shot Image Classification", arXiv, 2022 (Salesforce). [Paper]
    • CuPL: "What does a platypus look like? Generating customized prompts for zero-shot image classification", arXiv, 2022 (University of Washington). [Paper][PyTorch]
    • VL-Taboo: "VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models", arXiv, 2022 (Goethe University Frankfurt, Germany). [Paper][Code (in construction)]
    • CALIP: "CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention", arXiv, 2022 (Peking University). [Paper]
    • PromptCompVL: "Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning", arXiv, 2022 (Michigan State). [Paper]
    • SuS-X: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models", arXiv, 2022 (Cambridge). [Paper][PyTorch]
    • I2MVFormer: "I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification", arXiv, 2022 (ETHZ). [Paper]
  • X-Shot:
    • CrossTransformer: "CrossTransformers: spatially-aware few-shot transfer", NeurIPS, 2020 (DeepMind). [Paper][Tensorflow]
    • URT: "A Universal Representation Transformer Layer for Few-Shot Image Classification", ICLR, 2021 (Mila). [Paper][PyTorch]
    • TRX: "Temporal-Relational CrossTransformers for Few-Shot Action Recognition", CVPR, 2021 (University of Bristol). [Paper][PyTorch]
    • Few-shot-Transformer: "Few-Shot Transformation of Common Actions into Time and Space", arXiv, 2021 (University of Amsterdam). [Paper]
    • HCTransformers: "Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning", CVPR, 2022 (Fudan University). [Paper][PyTorch]
    • HyperTransformer: "HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning", CVPR, 2022 (Google). [Paper][PyTorch][Website]
    • STRM: "Spatio-temporal Relation Modeling for Few-shot Action Recognition", CVPR, 2022 (MBZUAI). [Paper][PyTorch][Website]
    • HyperTransformer: "HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning", ICML, 2022 (Google). [Paper]
    • CPM: "Compound Prototype Matching for Few-shot Action Recognition", ECCV, 2022 (The University of Tokyo). [Paper]
    • SUN: "Self-Promoted Supervision for Few-Shot Transformer", ECCV, 2022 (Harbin Institute of Technology + NUS). [Paper][PyTorch]
    • Tip-Adapter: "Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification", ECCV, 2022 (Shanghai AI Lab). [Paper][PyTorch]
    • tSF: "tSF: Transformer-Based Semantic Filter for Few-Shot Learning", ECCV, 2022 (Tencent). [Paper]
    • TransVLAD: "TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning", ECCV, 2022 (Southern University of Science and Technology, China). [Paper]
    • BaseTransformers: "BaseTransformers: Attention over base data-points for One Shot Learning", BMVC, 2022 (Dublin City University, Ireland). [Paper][PyTorch]
    • FPTrans: "Feature-Proxy Transformer for Few-Shot Segmentation", NeurIPS, 2022 (Baidu). [Paper][Code (in construction)]
    • MM-Former: "Mask Matching Transformer for Few-Shot Segmentation", NeurIPS, 2022 (Picsart). [Paper][PyTorch]
    • MG-ViT: "Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning", arXiv, 2022 (University of Electronic Science and Technology of China). [Paper]
    • QSFormer: "Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification", arXiv, 2022 (Anhui University). [Paper]
    • FS-CT: "Enhancing Few-shot Image Classification with Cosine Transformer", arXiv, 2022 (VinUniversity, Vietnam). [Paper][PyTorch]
    • CoCa-CNI: "Exploiting Category Names for Few-Shot Classification with Vision-Language Models", arXiv, 2022 (Google). [Paper]
  • Continual Learning:
    • MEAT: "Meta-attention for ViT-backed Continual Learning", CVPR, 2022 (Zhejiang University). [Paper][Code (in construction)]
    • DyTox: "DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion", CVPR, 2022 (Sorbonne Universite, France). [Paper][PyTorch]
    • LVT: "Continual Learning With Lifelong Vision Transformer", CVPR, 2022 (The University of Sydney). [Paper]
    • L2P: "Learning to Prompt for Continual Learning", CVPR, 2022 (Google). [Paper][Tensorflow]
    • ?: "Simpler is Better: off-the-shelf Continual Learning Through Pretrained Backbones", CVPRW, 2022 (Ca' Foscari University, Italy). [Paper][PyTorch]
    • ADA: "Continual Learning with Transformers for Image Classification", CVPRW, 2022 (Amazon). [Paper]
    • ?: "Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization", CVPRW, 2022 (Ca' Foscari University, Italy). [Paper]
    • DualPrompt: "DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning", ECCV, 2022 (Google). [Paper][Tensorflow]
    • CVT: "Online Continual Learning with Contrastive Vision Transformer", ECCV, 2022 (The University of Sydney). [Paper]
    • IncCLIP: "Generative Negative Text Replay for Continual Vision-Language Pretraining", ECCV, 2022 (ShanghaiTech). [Paper]
    • S-Prompts: "S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning", NeurIPS, 2022 (Singapore Management University). [Paper]
    • ADA: "Memory Efficient Continual Learning with Transformers", NeurIPS, 2022 (Amazon). [Paper]
    • BMU-MoCo: "BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling", NeurIPS, 2022 (Renmin University of China). [Paper]
    • CLiMB: "CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks", NeurIPS (Datasets and Benchmarks), 2022 (USC). [Paper][PyTorch]
    • COLT: "Transformers Are Better Continual Learners", arXiv, 2022 (Hikvision). [Paper]
    • D3Former: "D3Former: Debiased Dual Distilled Transformer for Incremental Learning", arXiv, 2022 (MBZUAI). [Paper][PyTorch]
    • Continual-CLIP: "CLIP model is an Efficient Continual Learner", arXiv, 2022 (MBZUAI). [Paper][Code (in construction)]
    • GCAB-CFDC: "Gated Class-Attention with Cascaded Feature Drift Compensation for Exemplar-free Continual Learning of Vision Transformers", arXiv, 2022 (University of Pavia, Italy). [Paper][Code (in construction)]
    • CODA-Prompt: "CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning", arXiv, 2022 (IBM). [Paper]
    • PIVOT: "PIVOT: Prompting for Video Continual Learning", arXiv, 2022 (KAUST). [Paper]
  • Long-tail/Imbalanced:
    • BatchFormer: "BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning", CVPR, 2022 (The University of Sydney). [Paper][PyTorch]
    • BatchFormerV2: "BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning", arXiv, 2022 (The University of Sydney). [Paper]
    • LPT: "LPT: Long-tailed Prompt Tuning for Image Classification", arXiv, 2022 (Harbin Institute of Technology). [Paper]
    • LiVT: "Learning Imbalanced Data with Vision Transformers", arXiv, 2022 (Tsinghua). [Paper][PyTorch (in construction)]
  • Knowledge Distillation:
    • ?: "Knowledge Distillation via the Target-aware Transformer", CVPR, 2022 (Alibaba). [Paper]
    • DearKD: "DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers", CVPR, 2022 (JD). [Paper]
    • AttnDistill: "Attention Distillation: self-supervised vision transformer students need more guidance", BMVC, 2022 (UAB, Spain). [Paper][PyTorch]
    • ViTKD: "ViTKD: Practical Guidelines for ViT feature knowledge distillation", arXiv, 2022 (IDEA). [Paper][PyTorch (in construction)]
    • ?: "Adaptive Attention Link-based Regularization for Vision Transformers", arXiv, 2022 (* Chung-Ang University, Korea*). [Paper]
  • Clustering:
    • VTCC: "Vision Transformer for Contrastive Clustering", arXiv, 2022 (Sun Yat-sen University, China). [Paper]
  • Novel Category Discovery:
    • PromptCAL: "PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery", arXiv, 2022 (MBZUAI). [Paper][Code (in construction)]

[Back to Overview]

Low-level Vision Tasks

Image Restoration

  • General:
    • NLRN: "Non-Local Recurrent Network for Image Restoration", NeurIPS, 2018 (UIUC). [Paper][Tensorflow]
    • RNAN: "Residual Non-local Attention Networks for Image Restoration", ICLR, 2019 (Northeastern University). [Paper][PyTorch]
    • PANet: "Pyramid Attention Networks for Image Restoration", arXiv, 2020 (UIUC). [Paper][PyTorch]
    • IPT: "Pre-Trained Image Processing Transformer", CVPR, 2021 (Huawei). [Paper][PyTorch (in construction)]
    • SwinIR: "SwinIR: Image Restoration Using Swin Transformer", ICCVW, 2021 (ETHZ). [Paper][PyTorch]
    • SiamTrans: "SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained Siamese Transformers", AAAI, 2022 (Huawei). [Paper]
    • Uformer: "Uformer: A General U-Shaped Transformer for Image Restoration", CVPR, 2022 (University of Science and Technology of China). [Paper][PyTorch]
    • MAXIM: "MAXIM: Multi-Axis MLP for Image Processing", CVPR, 2022 (Google). [Paper][Tensorflow]
    • Restormer: "Restormer: Efficient Transformer for High-Resolution Image Restoration", CVPR, 2022 (IIAI, UAE). [Paper][PyTorch]
    • TransWeather: "TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions", CVPR, 2022 (JHU). [Paper][PyTorch][Website]
    • KiT: "KNN Local Attention for Image Restoration", CVPR, 2022 (Yonsei University). [Paper]
    • ELMformer: "ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative Transformer", ACMMM, 2022 (Horizon Robotics). [Paper][Code (in construction)]
    • EDT: "On Efficient Transformer-Based Image Pre-training for Low-Level Vision", arXiv, 2022 (CUHK). [Paper][PyTorch]
    • ?: "Transform your Smartphone into a DSLR Camera: Learning the ISP in the Wild", arXiv, 2022 (ETHZ). [Paper]
    • TMT: "Imaging through the Atmosphere using Turbulence Mitigation Transformer", arXiv, 2022 (Purdue). [Paper][Code (in construction)][Website]
    • LRT: "LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field Images", arXiv, 2022 (HKU). [Paper]
    • ART: "Accurate Image Restoration with Attention Retractable Transformer", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][PyTorch]
  • Super-Resolution:
    • SAN: "Second-Order Attention Network for Single Image Super-Resolution", CVPR, 2019 (Tsinghua). [Paper][PyTorch]
    • CS-NL: "Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining", CVPR, 2020 (UIUC). [Paper][PyTorch]
    • TTSR: "Learning Texture Transformer Network for Image Super-Resolution", CVPR, 2020 (Microsoft). [Paper][PyTorch]
    • HAN: "Single Image Super-Resolution via a Holistic Attention Network", ECCV, 2020 (Northeastern University). [Paper][PyTorch]
    • NLSN: "Image Super-Resolution With Non-Local Sparse Attention", CVPR, 2021 (UIUC). [Paper]
    • ITSRN: "Implicit Transformer Network for Screen Content Image Continuous Super-Resolution", NeurIPS, 2021 (Tianjin University). [Paper][PyTorch]
    • FPAN: "Feedback Pyramid Attention Networks for Single Image Super-Resolution", arXiv, 2021 (Nanjing University of Science and Technology). [Paper]
    • ESRT: "Efficient Transformer for Single Image Super-Resolution", arXiv, 2021 (Peking University). [Paper]
    • Fusformer: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution", arXiv, 2021 (University of Electronic Science and Technology of China). [Paper]
    • DPT: "Detail-Preserving Transformer for Light Field Image Super-Resolution", AAAI, 2022 (Beijing Institute of Technology). [Paper][PyTorch]
    • BSRT: "BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment", CVPRW, 2022 (Megvii). [Paper][PyTorch]
    • TATT: "A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution", CVPR, 2022 (The Hong Kong Polytechnic University). [Paper][PyTorch]
    • LBNet: "Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer", IJCAI, 2022 (Nanjing University of Posts and Telecommunications). [Paper][PyTorch (in construction)]
    • DATSR: "Reference-based Image Super-Resolution with Deformable Attention Transformer", ECCV, 2022 (ETHZ). [Paper][Code (in construction)]
    • ELAN: "Efficient Long-Range Attention Network for Image Super-resolution", ECCV, 2022 (The Hong Kong Polytechnic University). [Paper][PyTorch]
    • Swin2SR: "Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration", ECCVW, 2022 (University of Wurzburg, Germany). [Paper]
    • CAT: "Cross Aggregation Transformer for Image Restoration", NeurIPS, 2022 (Shanghai Jiao Tong). [Paper][PyTorch]
    • Stoformer: "Stochastic Window Transformer for Image Restoration", NeurIPS, 2022 (USTC). [Paper][PyTorch]
    • LFT: "Light Field Image Super-Resolution with Transformers", IEEE Signal Processing Letters, 2022 (National University of Defense Technology, China). [Paper][PyTorch]
    • ELAN: "Efficient Long-Range Attention Network for Image Super-resolution", arXiv, 2022 (The Hong Kong Polytechnic University). [Paper][Code (in construction)]
    • ACT: "Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution", arXiv, 2022 (LG). [Paper]
    • HIPA: "HIPA: Hierarchical Patch Transformer for Single Image Super Resolution", arXiv, 2022 (CUHK). [Paper]
    • CTCNet: "CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution", arXiv, 2022 (Nanjing University of Posts and Telecommunications). [Paper]
    • HAT: "Activating More Pixels in Image Super-Resolution Transformer", arXiv, 2022 (University of Macau). [Paper][Code (in construction)]
    • ShuffleMixer: "ShuffleMixer: An Efficient ConvNet for Image Super-Resolution", arXiv, 2022 (Nanjing University of Science and Technology). [Paper][PyTorch]
    • HST: "HST: Hierarchical Swin Transformer for Compressed Image Super-resolution", ECCVW, 2022 (USTC). [Paper]
    • SwinFIR: "SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution", arXiv, 2022 (Samsung). [Paper]
    • ITSRN++: "ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution", arXiv, 2022 (Tianjin University). [Paper]
    • NGswin: "N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution", arXiv, 2022 (Sogang University, Korea). [Paper]
  • Others:
    • SDNet: "SDNet: multi-branch for single image deraining using swin", arXiv, 2021 (Xinjiang University). [Paper][Code (in construction)]
    • ATTSF: "Attention! Stay Focus!", arXiv, 2021 (BridgeAI, Seoul). [Paper][Tensorflow]
    • HyLoG-ViT: "Hybrid Local-Global Transformer for Image Dehazing", arXiv, 2021 (Beihang University). [Paper]
    • HyperTransformer: "HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening", CVPR, 2022 (JHU). [Paper][PyTorch]
    • DeHamer: "Image Dehazing Transformer With Transmission-Aware 3D Position Embedding", CVPR, 2022 (Nankai University). [Paper][Website]
    • PTNet: "Learning Parallax Transformer Network for Stereo Image JPEG Artifacts Removal", ACMMM, 2022 (Fudan University). [Paper]
    • CharFormer: "CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising", ACMMM, 2022 (Jilin University). [Paper][PyTorch (in construction)]
    • TurbNet: "Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model", ECCV, 2022 (Purdue + UT Austin). [Paper][PyTorch]
    • Stripformer: "Stripformer: Strip Transformer for Fast Image Deblurring", ECCV, 2022 (NTHU). [Paper]
    • DehazeFormer: "Vision Transformers for Single Image Dehazing", arXiv, 2022 (Zhejiang University). [Paper][PyTorch]
    • RSTCANet: "Residual Swin Transformer Channel Attention Network for Image Demosaicing", arXiv, 2022 (Tampere University, Finland). [Paper]
    • DRT: "DRT: A Lightweight Single Image Deraining Recursive Transformer", arXiv, 2022 (ANU, Australia). [Paper][PyTorch (in construction)]
    • DenSformer: "Dense residual Transformer for image denoising", arXiv, 2022 (University of Science and Technology Beijing). [Paper]
    • Cubic-Mixer: "UHD Image Deblurring via Multi-scale Cubic-Mixer", arXiv, 2022 (Nanjing University of Science and Technology). [Paper]
    • PoCoformer: "Polarized Color Image Denoising using Pocoformer", arXiv, 2022 (The University of Tokyo). [Paper]
    • MSP-Former: "MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing", arXiv, 2022 (Jimei University). [Paper]
    • ELF: "Magic ELF: Image Deraining Meets Association Learning and Transformer", arXiv, 2022 (Wuhan University). [Paper][PyTorch (in construction)]
    • DnSwin: "DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer", arXiv, 2022 (Guangdong University of Technology). [Paper]
    • SnowFormer: "SnowFormer: Scale-aware Transformer via Context Interaction for Single Image Desnowing", arXiv, 2022 (Jimei University, China). [Paper]
    • DMTNet: "DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer", arXiv, 2022 (Samsung). [Paper]
    • LMQFormer: "LMQFormer: A Laplace-Prior-Guided Mask Query Transformer for Lightweight Snow Removal", arXiv, 2022 (Fuzhou University). [Paper]
    • Semi-UFormer: "Semi-UFormer: Semi-supervised Uncertainty-aware Transformer for Image Dehazing", arXiv, 2022 (Nanjing University of Aeronautics and Astronautics). [Paper]
    • WITT: "WITT: A Wireless Image Transmission Transformer for Semantic Communications", arXiv, 2022 (Beijing University of Posts and Telecommunications). [Paper][Code (in construction)]
    • BiT: "Blur Interpolation Transformer for Real-World Motion from Blur", arXiv, 2022 (The University of Tokyo). [Paper]
    • FFTformer: "Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring", arXiv, 2022 (Nanjing University of Science and Technology). [Paper][Code (in construction)]
    • SST: "Spatial-Spectral Transformer for Hyperspectral Image Denoising", arXiv, 2022 (Beijing Institute of Technology). [Paper][PyTorch]

[Back to Overview]

Video Restoration

  • VSR-Transformer: "Video Super-Resolution Transformer", arXiv, 2021 (ETHZ). [Paper][PyTorch]
  • MANA: "Memory-Augmented Non-Local Attention for Video Super-Resolution", CVPR, 2022 (JD). [Paper]
  • ?: "Bringing Old Films Back to Life", CVPR, 2022 (Microsoft). [Paper][Code (in construction)]
  • TTVSR: "Learning Trajectory-Aware Transformer for Video Super-Resolution", CVPR, 2022 (Microsoft). [Paper][PyTorch]
  • Trans-SVSR: "A New Dataset and Transformer for Stereoscopic Video Super-Resolution", CVPR, 2022 (Bahcesehir University, Turkey). [Paper][PyTorch]
  • STDAN: "STDAN: Deformable Attention Network for Space-Time Video Super-Resolution", CVPRW, 2022 (Tsinghua). [Paper]
  • VRT: "VRT: A Video Restoration Transformer", arXiv, 2022 (ETHZ). [Paper][PyTorch]
  • FGST: "Flow-Guided Sparse Transformer for Video Deblurring", ICML, 2022 (Tsinghua). [Paper][Code (in construction)]
  • RSTT: "RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution", CVPR, 2022 (Microsoft). [Paper][PyTorch]
  • FTVSR: "Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution", ECCV, 2022 (Microsoft). [Paper][PyTorch]
  • EFNet: "Event-Based Fusion for Motion Deblurring with Cross-modal Attention", ECCV, 2022 (ETHZ). [Paper]
  • TempFormer: "TempFormer: Temporally Consistent Transformer for Video Denoising", ECCV, 2022 (Disney). [Paper]
  • RVRT: "Recurrent Video Restoration Transformer with Guided Deformable Attention", NeurIPS, 2022 (ETHZ). [Paper][PyTorch]
  • ?: "Rethinking Alignment in Video Super-Resolution Transformers", NeurIPS, 2022 (Shanghai AI Lab). [Paper][PyTorch]
  • VDTR: "VDTR: Video Deblurring with Transformer", arXiv, 2022 (Tsinghua). [Paper][Code (in construction)]
  • DSCT: "Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer", arXiv, 2022 (Beijing University of Posts and Telecommunications). [Paper]
  • Group-ShiftNet: "No Attention is Needed: Grouped Spatial-temporal Shift for Simple and Efficient Video Restorers", arXiv, 2022 (CUHK). [Paper][Code (in construction)][Website]

[Back to Overview]

Inpainting / Completion / Outpainting

  • Contexual-Attention: "Generative Image Inpainting with Contextual Attention", CVPR, 2018 (UIUC). [Paper][Tensorflow]
  • PEN-Net: "Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting", CVPR, 2019 (Microsoft). [Paper][PyTorch]
  • Copy-Paste: "Copy-and-Paste Networks for Deep Video Inpainting", ICCV, 2019 (Yonsei University). [Paper][PyTorch]
  • Onion-Peel: "Onion-Peel Networks for Deep Video Completion", ICCV, 2019 (Yonsei University). [Paper][PyTorch]
  • STTN: "Learning Joint Spatial-Temporal Transformations for Video Inpainting", ECCV, 2020 (Microsoft). [Paper][PyTorch]
  • FuseFormer: "FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting", ICCV, 2021 (CUHK + SenseTime). [Paper][PyTorch]
  • ICT: "High-Fidelity Pluralistic Image Completion with Transformers", ICCV, 2021 (CUHK). [Paper][PyTorch][Website]
  • DSTT: "Decoupled Spatial-Temporal Transformer for Video Inpainting", arXiv, 2021 (CUHK + SenseTime). [Paper][Code (in construction)]
  • TFill: "TFill: Image Completion via a Transformer-Based Architecture", arXiv, 2021 (NTU Singapore). [Paper][Code (in construction)]
  • BAT-Fill: "Diverse Image Inpainting with Bidirectional and Autoregressive Transformers", arXiv, 2021 (NTU Singapore). [Paper]
  • ?: "Image-Adaptive Hint Generation via Vision Transformer for Outpainting", WACV, 2022 (Sogang University, Korea). [Paper]
  • ZITS: "Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding", CVPR, 2022 (Fudan). [Paper][PyTorch][Website]
  • MAT: "MAT: Mask-Aware Transformer for Large Hole Image Inpainting", CVPR, 2022 (CUHK). [Paper][PyTorch]
  • PUT: "Reduce Information Loss in Transformers for Pluralistic Image Inpainting", CVPR, 2022 (Microsoft). [Paper][PyTorch]
  • DLFormer: "DLFormer: Discrete Latent Transformer for Video Inpainting", CVPR, 2022 (Tencent). [Paper][Code (in construction)]
  • QueryOTR: "Outpainting by Queries", ECCV, 2022 (University of Liverpool, UK). [Paper][PyTorch (in construction)]
  • FGT: "Flow-Guided Transformer for Video Inpainting", ECCV, 2022 (USTC). [Paper][PyTorch]
  • MAE-FAR: "Learning Prior Feature and Attention Enhanced Image Inpainting", ECCV, 2022 (Fudan University). [Paper][PyTorch (in construction)][Website]
  • ?: "Visual Prompting via Image Inpainting", NeurIPS, 2022 (Berkeley). [Paper][PyTorch][Website]
  • U-Transformer: "Generalised Image Outpainting with U-Transformer", arXiv, 2022 (Xi'an Jiaotong-Liverpool University). [Paper]
  • SpA-Former: "SpA-Former: Transformer image shadow detection and removal via spatial attention", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][PyTorch]
  • CRFormer: "CRFormer: A Cross-Region Transformer for Shadow Removal", arXiv, 2022 (Beijing Jiaotong University). [Paper]
  • DeViT: "DeViT: Deformed Vision Transformers in Video Inpainting", arXiv, 2022 (Kuaishou). [Paper]
  • ZITS++: "ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors", arXiv, 2022 (Fudan). [Paper]
  • TPFNet: "TPFNet: A Novel Text In-painting Transformer for Text Removal", arXiv, 2022 (?). [Paper][Code (in construction)]
  • FlowLens: "FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer", arXiv, 2022 (Zhejiang University). [Paper][Code (in construction)]

[Back to Overview]

Image Generation

  • IT: "Image Transformer", ICML, 2018 (Google). [Paper][Tensorflow]
  • PixelSNAIL: "PixelSNAIL: An Improved Autoregressive Generative Model", ICML, 2018 (Berkeley). [Paper][Tensorflow]
  • BigGAN: "Large Scale GAN Training for High Fidelity Natural Image Synthesis", ICLR, 2019 (DeepMind). [Paper][PyTorch]
  • SAGAN: "Self-Attention Generative Adversarial Networks", ICML, 2019 (Google). [Paper][Tensorflow]
  • VQGAN: "Taming Transformers for High-Resolution Image Synthesis", CVPR, 2021 (Heidelberg University). [Paper][PyTorch][Website]
  • ?: "High-Resolution Complex Scene Synthesis with Transformers", CVPRW, 2021 (Heidelberg University). [Paper]
  • GANsformer: "Generative Adversarial Transformers", ICML, 2021 (Stanford + Facebook). [Paper][Tensorflow]
  • PixelTransformer: "PixelTransformer: Sample Conditioned Signal Generation", ICML, 2021 (Facebook). [Paper][Website]
  • HWT: "Handwriting Transformers", ICCV, 2021 (MBZUAI). [Paper][Code (in construction)]
  • Paint-Transformer: "Paint Transformer: Feed Forward Neural Painting with Stroke Prediction", ICCV, 2021 (Baidu). [Paper][Paddle][PyTorch]
  • Geometry-Free: "Geometry-Free View Synthesis: Transformers and no 3D Priors", ICCV, 2021 (Heidelberg University). [Paper][PyTorch]
  • VTGAN: "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers", ICCVW, 2021 (University of Nevada, Reno). [Paper]
  • ATISS: "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS, 2021 (NVIDIA). [Paper][Website]
  • GANsformer2: "Compositional Transformers for Scene Generation", NeurIPS, 2021 (Stanford + Facebook). [Paper][Tensorflow]
  • TransGAN: "TransGAN: Two Transformers Can Make One Strong GAN", NeurIPS, 2021 (UT Austin). [Paper][PyTorch]
  • HiT: "Improved Transformer for High-Resolution GANs", NeurIPS, 2021 (Google). [Paper][Tensorflow]
  • iLAT: "The Image Local Autoregressive Transformer", NeurIPS, 2021 (Fudan). [Paper]
  • TokenGAN: "Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers", NeurIPS, 2021 (Microsoft). [Paper]
  • SceneFormer: "SceneFormer: Indoor Scene Generation with Transformers", arXiv, 2021 (TUM). [Paper]
  • SNGAN: "Combining Transformer Generators with Convolutional Discriminators", arXiv, 2021 (Fraunhofer ITWM). [Paper]
  • Invertible-Attention: "Invertible Attention", arXiv, 2021 (ANU). [Paper]
  • GPA: "Grid Partitioned Attention: Efficient Transformer Approximation with Inductive Bias for High Resolution Detail Generation", arXiv, 2021 (Zalando Research, Germany). [Paper][PyTorch (in construction)]
  • ViTGAN: "ViTGAN: Training GANs with Vision Transformers", ICLR, 2022 (Google). [Paper][PyTorch][PyTorch (wilile26811249)]
  • ViT-VQGAN: "Vector-quantized Image Modeling with Improved VQGAN", ICLR, 2022 (Google). [Paper]
  • Style-Transformer: "Style Transformer for Image Inversion and Editing", CVPR, 2022 (East China Normal University). [Paper][PyTorch]
  • StyleSwin: "StyleSwin: Transformer-based GAN for High-resolution Image Generation", CVPR, 2022 (Microsoft). [Paper][PyTorch]
  • Styleformer: "Styleformer: Transformer based Generative Adversarial Networks with Style Vector", CVPR, 2022 (Seoul National University). [Paper][PyTorch]
  • ?: "User-Controllable Latent Transformer for StyleGAN Image Layout Editing", Pacific Graphics, 2022 (University of Tsukuba). [Paper][Website]
  • DynaST: "DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation", ECCV, 2022 (NUS). [Paper][PyTorch]
  • DoodleFormer: "DoodleFormer: Creative Sketch Drawing with Transformers", ECCV, 2022 (MBZUAI). [Paper][PyTorch][Website]
  • U-Attention: "Paying U-Attention to Textures: Multi-Stage Hourglass Vision Transformer for Universal Texture Synthesis", arXiv, 2022 (Adobe). [Paper]
  • MaskGIT: "MaskGIT: Masked Generative Image Transformer", CVPR, 2022 (Google). [Paper][PyTorch (dome272)]
  • AttnFlow: "Generative Flows with Invertible Attentions", CVPR, 2022 (ETHZ). [Paper]
  • NÜWA: "NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion", ECCV, 2022 (Microsoft). [Paper][GitHub]
  • Trans-INR: "Transformers as Meta-Learners for Implicit Neural Representations", ECCV, 2022 (UCSD). [Paper][PyTorch][Websiste]
  • ViewFormer: "ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers", ECCV, 2022 (Czech Technical University in Prague). [Paper][Tensorflow]
  • Unleashing-Transformer: "Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes", ECCV, 2022 (Durham University, UK). [Paper][PyTorch]
  • CASD: "Cross Attention Based Style Distribution for Controllable Person Image Synthesis", ECCV, 2022 (East China Norma lUniversity). [Paper]
  • VQGAN-CLIP: "VQGAN-CLIP: Open Domain Image Generation and Manipulation Using Natural Language ", ECCV, 2022 (EleutherAI). [Paper][PyTorch]
  • Token-Critic: "Improved Masked Image Generation with Token-Critic", ECCV, 2022 (Google). [Paper]
  • PromptGen: "Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models", NeurIPS, 2022 (CMU). [Paper][PyTorch]
  • Contextual-RQ-Transformer: "Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer", NeurIPS, 2022 (POSTECH + Kakao). [Paper]
  • ViT-Patch: "A Robust Framework of Chromosome Straightening with ViT-Patch GAN", arXiv, 2022 (Xi'an Jiaotong-Liverpool University). [Paper]
  • ?: "Transforming Image Generation from Scene Graphs", arXiv, 2022 (University of Catania, Italy). [Paper]
  • VisionNeRF: "Vision Transformer for NeRF-Based View Synthesis from a Single Input Image", arXiv, 2022 (Google). [Paper][Website]
  • NUWA-Infinity: "NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis", arXiv, 2022 (Microsoft). [Paper][GitHub][Website]
  • Diffusion-ViT: "Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model", arXiv, 2022 (Etsy, NY). [Paper]
  • ?: "Visual Prompt Tuning for Generative Transfer Learning", arXiv, 2022 (Google). [Paper]
  • SeQ-GAN: "Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis", arXiv, 2022 (Tencent). [Paper][Code (in construction)]
  • ?: "Style-Guided Inference of Transformer for High-resolution Image Synthesis", WACV, 2023 (NCSOFT, Korea). [Paper]
  • Frido: "Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis", AAAI, 2023 (Microsoft). [Paper][PyTorch]

[Back to Overview]

Video Generation

  • Subscale: "Scaling Autoregressive Video Models", ICLR, 2020 (Google). [Paper][Website]
  • ConvTransformer: "ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis", arXiv, 2020 (Southeast University). [Paper]
  • OCVT: "Generative Video Transformer: Can Objects be the Words?", ICML, 2021 (Rutgers University). [Paper]
  • AIST++: "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation", arXiv, 2021 (Google). [Paper][Code][Website]
  • VideoGPT: "VideoGPT: Video Generation using VQ-VAE and Transformers", arXiv, 2021 (Berkeley). [Paper][PyTorch][Website]
  • DanceFormer: "DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer", AAAI, 2022 (Huiye Technology, China). [Paper]
  • VFIformer: "Video Frame Interpolation with Transformer", CVPR, 2022 (CUHK). [Paper][PyTorch]
  • VFIT: "Video Frame Interpolation Transformer", CVPR, 2022 (McMaster Univeristy, Canada). [Paper][PyTorch]
  • MoTrans: "Motion Transformer for Unsupervised Image Animation", ECCV, 2022 (Alibaba). [Paper][PyTorch]
  • Transframer: "Transframer: Arbitrary Frame Prediction with Generative Models", arXiv, 2022 (DeepMind). [Paper]
  • TATS: "Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer", ECCV, 2022 (Maryland). [Paper][Website]
  • POVT: "Patch-based Object-centric Transformers for Efficient Video Generation", arXiv, 2022 (Berkeley). [Paper][PyTorch][Website]
  • TAIN: "Cross-Attention Transformer for Video Interpolation", arXiv, 2022 (Duke). [Paper]
  • TTVFI: "TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation", arXiv, 2022 (Microsoft). [Paper]
  • TECO: "Temporally Consistent Video Transformer for Long-Term Video Prediction", arXiv, 2022 (Berkeley). [Paper][Jax][Website]
  • SlotFormer: "SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models", arXiv, 2022 (University of Toronto). [Paper][Website]
  • MAGVIT: "MAGVIT: Masked Generative Video Transformer", arXiv, 2022 (Google). [Paper][Code (in construction)][Website]

[Back to Overview]

Transfer / Translation / Manipulation

  • AdaAttN: "AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer", ICCV, 2021 (Baidu). [Paper][Paddle][PyTorch]
  • StyleCLIP: "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery", ICCV, 2021 (Hebrew University of Jerusalem). [Paper][PyTorch]
  • StyTr2: "StyTr^2: Unbiased Image Style Transfer with Transformers", CVPR, 2022 (CAS). [Paper][PyTorch]
  • InstaFormer: "InstaFormer: Instance-Aware Image-to-Image Translation with Transformer", CVPR, 2022 (Korea University). [Paper]
  • ManiTrans: "ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation", CVPR, 2022 (Huawei). [Paper][Website]
  • QS-Attn: "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation", CVPR, 2022 (Shanghai Key Laboratory). [Paper][PyTorch]
  • ASSET: "ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions", SIGGRAPH, 2022 (Adobe). [Paper][PyTorch][Website]
  • SCAM: "SCAM! Transferring humans between images with Semantic Cross Attention Modulation", ECCV, 2022 (Univ Gustave Eiffel, France). [Paper][PyTorch][Website]
  • TargetCLIP: "Image-Based CLIP-Guided Essence Transfer", ECCV, 2022 (Tel Aviv). [Paper][PyTorch]
  • FFCLIP: "One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations", NeurIPS, 2022 (Tencent). [Paper][Code (in construction)]
  • STTR: "Fine-Grained Image Style Transfer with Visual Transformers", ACCV, 2022 (The Univerisity of Tokyo). [Paper][PyTorch (in construction)]
  • Splice: "Splicing ViT Features for Semantic Appearance Transfer", arXiv, 2022 (Weizmann Institute of Science, Israel). [Paper][PyTorch][Website]
  • UVCGAN: "UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation", arXiv, 2022 (Brookhaven National Laboratory, NY). [Paper]
  • ITTR: "ITTR: Unpaired Image-to-Image Translation with Transformers", arXiv, 2022 (Kuaishou). [Paper]
  • CLIPasso: "CLIPasso: Semantically-Aware Object Sketching", arXiv, 2022 (EPFL). [Paper][PyTorch][Website]
  • CTrGAN: "CTrGAN: Cycle Transformers GAN for Gait Transfer", arXiv, 2022 (Ariel University, Israel). [Paper]
  • PI-Trans: "PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation", arXiv, 2022 (University of Trento, Italy). [Paper][PyTorch (in construction)]
  • CSLA: "Bridging CLIP and StyleGAN through Latent Alignment for Image Editing", arXiv, 2022 (Kuaishou). [Paper]
  • CLIP-PAE: "CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Image Manipulation", arXiv, 2022 (University of Cambridge). [Paper]
  • S2WAT: "S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention", arXiv, 2022 (Sichuan Normal University). [Paper]

[Back to Overview]

Other Low-Level Tasks

  • Colorization:
    • ColTran: "Colorization Transformer", ICLR, 2021 (Google). [Paper][Tensorflow]
    • ViT-I-GAN: "ViT-Inception-GAN for Image Colourising", arXiv, 2021 (D.Y Patil College of Engineering, India). [Paper]
    • CT2: "CT2: Colorization Transformer via Color Tokens", ECCV, 2022 (Peking University). [Paper][PyTorch]
    • L-CoDer: "L-CoDer: Language-based Colorization with Color-object Decoupling Transformer", ECCV, 2022 (Beijing University of Posts and Telecommunications). [Paper]
    • ColorFormer: "ColorFormer: Image Colorization via Color Memory assisted Hybrid-attention Transformer", ECCV, 2022 (Tencent). [Paper]
    • UniColor: "UniColor: A Unified Framework for Multi-Modal Colorization with Transformer", SIGGRAPH Asia, 2022 (CUHK). [Paper][Website]
    • iColoriT: "iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer", arXiv, 2022 (KAIST). [Paper]
  • Enhancement:
    • PanFormer: "PanFormer: a Transformer Based Model for Pan-sharpening", ICME, 2022 (Beihang University). [Paper][PyTorch]
    • URSCT-UIE: "Reinforced Swin-Convs Transformer for Underwater Image Enhancement", arXiv, 2022 (Ningbo University). [Paper]
    • IAT: "Illumination Adaptive Transformer", arXiv, 2022 (The University of Tokyo). [Paper][PyTorch]
    • SPGAT: "Structural Prior Guided Generative Adversarial Transformers for Low-Light Image Enhancement", arXiv, 2022 (The Hong Kong Polytechnic University). [Paper]
    • SSTF: "End-to-end Transformer for Compressed Video Quality Enhancement", arXiv, 2022 (Nanjing University of Information Science and Technology). [Paper]
  • HDR:
    • CA-ViT: "Ghost-free High Dynamic Range Imaging with Context-aware Transformer", ECCV, 2022 (Megvii). [Paper][PyTorch]
    • Selective-TransHDR: "Selective TransHDR: Transformer-Based Selective HDR Imaging Using Ghost Region Mask", ECCV, 2022 (Sogang University, Korea). [Paper]
    • Text2Light: "Text2Light: Zero-Shot Text-Driven HDR Panorama Generation", SIGGRAPH Asia, 2022 (NTU, Singapore). [Paper][PyTorch][Website]
  • Harmonization:
    • HT: "Image Harmonization With Transformer", ICCV, 2021 (Ocean University of China). [Paper]
  • Compression:
    • ?: "Towards End-to-End Image Compression and Analysis with Transformers", AAAI, 2022 (1Harbin Institute of Technology). [Paper][PyTorch]
    • Entroformer: "Entroformer: A Transformer-based Entropy Model for Learned Image Compression", ICLR, 2022 (Alibaba). [Paper]
    • STF: "The Devil Is in the Details: Window-based Attention for Image Compression", CVPR, 2022 (CAS). [Paper][PyTorch]
    • Contextformer: "Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression", ECCV, 2022 (TUM). [Paper]
    • VCT: "VCT: A Video Compression Transformer", NeurIPS, 2022 (Google). [Paper]
  • Matting:
    • MatteFormer: "MatteFormer: Transformer-Based Image Matting via Prior-Tokens", CVPR, 2022 (SNU + NAVER). [Paper][PyTorch]
    • TransMatting: "TransMatting: Enhancing Transparent Objects Matting with Transformers", ECCV, 2022 (CAS). [Paper][Code (in construction)]
    • VMFormer: "VMFormer: End-to-End Video Matting with Transformer", arXiv, 2022 (PicsArt). [Paper][PyTorch][Website]
  • Reconstruction
    • ET-Net: "Event-Based Video Reconstruction Using Transformer", ICCV, 2021 (University of Science and Technology of China). [Paper][PyTorch]
    • GradViT: "GradViT: Gradient Inversion of Vision Transformers", CVPR, 2022 (NVIDIA). [Paper][Website]
    • MST: "Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction", CVPR, 2022 (Tsinghua). [Paper][PyTorch]
    • MST++: "MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction", CVPRW, 2022 (Tsinghua). [Paper][PyTorch]
    • CST: "Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction", ECCV, 2022 (Tsinghua). [Paper][PyTorch]
    • DAUHST: "Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging", NeurIPS, 2022 (Tsinghua). [Paper][PyTorch]
    • S2-Transformer: "S2-Transformer for Mask-Aware Hyperspectral Image Reconstruction", arXiv, 2022 (Rochester Institute of Technology). [Paper]
  • Radiance Fields:
    • NeXT: "NeXT: Towards High Quality Neural Radiance Fields via Multi-Skip Transformer", ECCV, 2022 (Tsinghua University). [Paper][JAX]
    • TransNeRF: "Generalizable Neural Radiance Fields for Novel View Synthesis with Transformer", arXiv, 2022 (UBC). [Paper]
  • 3D:
    • MNSRNet: "MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution", CVPR, 2022 (Shenzhen University). [Paper]
  • Others:
    • TransMEF: "TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning", AAAI, 2022 (Fudan). [Paper]
    • MS-Unet: "Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer", CVPR, 2022 (Megvii). [Paper][Code (in construction)]
    • TransCL: "TransCL: Transformer Makes Strong and Flexible Compressive Learning", TPAMI, 2022 (Peking University). [Paper][Code (in construction)]
    • GAP-CSCoT: "Spectral Compressive Imaging Reconstruction Using Convolution and Spectral Contextual Transformer", arXiv, 2022 (CAS). [Paper]
    • MatFormer: "MatFormer: A Generative Model for Procedural Materials", arXiv, 2022 (Adobe). [Paper]
    • FishFormer: "FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification with Efficacy Domain Exploration", arXiv, 2022 (Beijing Jiaotong University). [Paper]
    • STFormer: "Spatial-Temporal Transformer for Video Snapshot Compressive Imaging", arXiv, 2022 (CAS). [Paper][PyTorch]

[Back to Overview]

Reinforcement Learning

Navigation

  • VTNet: "VTNet: Visual Transformer Network for Object Goal Navigation", ICLR, 2021 (ANU). [Paper]
  • MaAST: "MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation", ICRA, 2021 (SRI). [Paper]
  • TransFuser: "Multi-Modal Fusion Transformer for End-to-End Autonomous Driving", CVPR, 2021 (MPI). [Paper][PyTorch]
  • CMTP: "Topological Planning With Transformers for Vision-and-Language Navigation", CVPR, 2021 (Stanford). [Paper]
  • VLN-BERT: "VLN-BERT: A Recurrent Vision-and-Language BERT for Navigation", CVPR, 2021 (ANU). [Paper][PyTorch]
  • E.T.: "Episodic Transformer for Vision-and-Language Navigation", ICCV, 2021 (Google). [Paper][PyTorch]
  • HAMT: "History Aware Multimodal Transformer for Vision-and-Language Navigation", NeurIPS, 2021 (INRIA). [Paper][PyTorch][Website]
  • SOAT: "SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation", NeurIPS, 2021 (Georgia Tech). [Paper]
  • OMT: "Object Memory Transformer for Object Goal Navigation", ICRA, 2022 (AIST, Japan). [Paper]
  • ADAPT: "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts", CVPR, 2022 (Huawei). [Paper]
  • DUET: "Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation", CVPR, 2022 (INRIA). [Paper][Website]
  • LSA: "Local Slot Attention for Vision-and-Language Navigation", ICMR, 2022 (Fudan). [Paper]
  • ?: "Learning from Unlabeled 3D Environments for Vision-and-Language Navigation", ECCV, 2022 (INRIA). [Paper][Website]
  • MTVM: "Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation", ECCV, 2022 (ByteDance). [Paper][PyTorch]
  • DDL: "Learning Disentanglement with Decoupled Labels for Vision-Language Navigation", ECCV, 2022 (Beijing Institute of Technology). [Paper][PyTorch]
  • Sim2Sim: "Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments", ECCV, 2022 (Oregon State University). [Paper][PyTorch][Website]
  • AVLEN: "AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments", NeurIPS, 2022 (UC Riverside). [Paper]
  • ZSON: "ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings", NeurIPS, 2022 (Georgia Tech). [Paper]
  • WS-MGMap: "Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation", NeurIPS, 2022 (South China University of Technology). [Paper][PyTorch (in construction)]
  • CLIP-Nav: "CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation", CoRLW, 2022 (Amazon). [Paper]
  • TransFuser: "TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving", arXiv, 2022 (MPI). [Paper]
  • TD-STP: "Target-Driven Structured Transformer Planner for Vision-Language Navigation", arXiv, 2022 (Beihang University). [Paper][Code (in construction)]
  • DAVIS: "Anticipating the Unseen Discrepancy for Vision and Language Navigation", arXiv, 2022 (UCSB). [Paper]
  • LOViS: "LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation", arXiv, 2022 (Michigan State). [Paper]
  • IVLN: "Iterative Vision-and-Language Navigation", arXiv, 2022 (Oregon State University). [Paper]
  • BEVBert: "BEVBert: Topo-Metric Map Pre-training for Language-guided Navigation", arXiv, 2022 (CAS). [Paper]

[Back to Overview]

Other RL Tasks

  • SVEA: "Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation", arXiv, 2021 (UCSD). [Paper][GitHub][Website]
  • LocoTransformer: "Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers", ICLR, 2022 (UCSD). [Paper][Website]
  • STAM: "Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes", CVPR, 2022 (McGill University, Canada). [Paper][PyTorch]
  • CtrlFormer: "CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer", ICML, 2022 (HKU). [Paper][PyTorch][Website]
  • PromptDT: "Prompting Decision Transformer for Few-Shot Policy Generalization", ICML, 2022 (CMU). [Paper][Website]
  • StARformer: "StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning", ECCV, 2022 (Stony Brook). [Paper][PyTorch]
  • RAD: "Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels", arXiv, 2022 (UBC, Canada). [Paper]
  • MWM: "Masked World Models for Visual Control", arXiv, 2022 (Berkeley). [Paper][Tensorflow][Website]
  • IRIS: "Transformers are Sample Efficient World Models", arXiv, 2022 (University of Geneva, Switzerland). [Paper][PyTorch]
  • InstructRL: "Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models", arXiv, 2022 (Google). [Paper]

[Back to Overview]

Medical

Medical Segmentation

  • Cross-Transformer: "The entire network structure of Crossmodal Transformer", ICBSIP, 2021 (Capital Medical University). [Paper]
  • Segtran: "Medical Image Segmentation using Squeeze-and-Expansion Transformers", IJCAI, 2021 (A*STAR). [Paper]
  • i-ViT: "Instance-based Vision Transformer for Subtyping of Papillary Renal Cell Carcinoma in Histopathological Image", MICCAI, 2021 (Xi'an Jiaotong University). [Paper][PyTorch][Website]
  • UTNet: "UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation", MICCAI, 2021 (Rutgers). [Paper]
  • MCTrans: "Multi-Compound Transformer for Accurate Biomedical Image Segmentation", MICCAI, 2021 (HKU + CUHK). [Paper][Code (in construction)]
  • Polyformer: "Few-Shot Domain Adaptation with Polymorphic Transformers", MICCAI, 2021 (A*STAR). [Paper][PyTorch]
  • BA-Transformer: "Boundary-aware Transformers for Skin Lesion Segmentation". MICCAI, 2021 (Xiamen University). [Paper][PyTorch]
  • GT-U-Net: "GT U-Net: A U-Net Like Group Transformer Network for Tooth Root Segmentation", MICCAIW, 2021 (Hangzhou Dianzi University). [Paper][PyTorch]
  • STN: "Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation", ISBI, 2021 (Institut Polytechnique de Paris). [Paper]
  • T-AutoML: "T-AutoML: Automated Machine Learning for Lesion Segmentation Using Transformers in 3D Medical Imaging", ICCV, 2021 (NVIDIA). [Paper]
  • MedT: "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation", arXiv, 2021 (Johns Hopkins). [Paper][PyTorch]
  • Convolution-Free: "Convolution-Free Medical Image Segmentation using Transformers", arXiv, 2021 (Harvard). [Paper]
  • CoTR: "CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation", arXiv, 2021 (Northwestern Polytechnical University). [Paper][PyTorch]
  • TransBTS: "TransBTS: Multimodal Brain Tumor Segmentation Using Transformer", arXiv, 2021 (University of Science and Technology Beijing). [Paper][PyTorch]
  • SpecTr: "SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation", arXiv, 2021 (East China Normal University). [Paper][Code (in construction)]
  • U-Transformer: "U-Net Transformer: Self and Cross Attention for Medical Image Segmentation", arXiv, 2021 (CEDRIC). [Paper]
  • TransUNet: "TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation", arXiv, 2021 (Johns Hopkins). [Paper][PyTorch]
  • PMTrans: "Pyramid Medical Transformer for Medical Image Segmentation", arXiv, 2021 (Washington University in St. Louis). [Paper]
  • PBT-Net: "Anatomy-Guided Parallel Bottleneck Transformer Network for Automated Evaluation of Root Canal Therapy", arXiv, 2021 (Hangzhou Dianzi University). [Paper]
  • Swin-Unet: "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation", arXiv, 2021 (Huawei). [Paper][Code (in construction)]
  • MBT-Net: "A Multi-Branch Hybrid Transformer Networkfor Corneal Endothelial Cell Segmentation", arXiv, 2021 (Southern University of Science and Technology). [Paper]
  • WAD: "More than Encoder: Introducing Transformer Decoder to Upsample", arXiv, 2021 (South China University of Technology). [Paper]
  • LeViT-UNet: "LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation", arXiv, 2021 (Wuhan Institute of Technology). [Paper]
  • ?: "Evaluating Transformer based Semantic Segmentation Networks for Pathological Image Segmentation", arXiv, 2021 (Vanderbilt University). [Paper]
  • nnFormer: "nnFormer: Interleaved Transformer for Volumetric Segmentation", arXiv, 2021 (HKU + Xiamen University). [Paper][PyTorch]
  • MISSFormer: "MISSFormer: An Effective Medical Image Segmentation Transformer", arXiv, 2021 (Beijing University of Posts and Telecommunications). [Paper]
  • TUnet: "Transformer-Unet: Raw Image Processing with Unet", arXiv, 2021 (Beijing Zoezen Robot + Beihang University). [Paper]
  • BiTr-Unet: "BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor Segmentation", arXiv, 2021 (New York University). [Paper]
  • ?: "Transformer Assisted Convolutional Network for Cell Instance Segmentation", arXiv, 2021 (IIT Dhanbad). [Paper]
  • ?: "Combining CNNs With Transformer for Multimodal 3D MRI Brain Tumor Segmentation With Self-Supervised Pretraining", arXiv, 2021 (Ukrainian Catholic University). [Paper]
  • UNETR: "UNETR: Transformers for 3D Medical Image Segmentation", WACV, 2022 (NVIDIA). [Paper][PyTorch]
  • AFTer-UNet: "AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation", WACV, 2022 (UC Irvine). [Paper]
  • UCTransNet: "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer", AAAI, 2022 (Northeastern University, China). [Paper][PyTorch]
  • Swin-UNETR: "Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis", CVPR, 2022 (NVIDIA). [Paper][PyTorch]
  • ?: "Transformer-based out-of-distribution detection for clinically safe segmentation", Medical Imaging with Deep Learning (MIDL), 2022 (King’s College London). [Paper]
  • ScaleFormer: "ScaleFormer: Revisiting the Transformer-based Backbones from a Scale-wise Perspective for Medical Image Segmentation", IJCAI, 2022 (Zhejiang University). [Paper][Code (in construction)]
  • FCBFormer: "FCN-Transformer Feature Fusion for Polyp Segmentation", Annual Conference on Medical Image Understanding and Analysis (MIUA), 2022 (University of Central Lancashire, UK). [Paper][PyTorch]
  • VDFormer: "View-Disentangled Transformer for Brain Lesion Detection", ISBI, 2022 (CUHK). [Paper][PyTorch]
  • TFCNs: "TFCNs: A CNN-Transformer Hybrid Network for Medical Image Segmentation", International Conference on Artificial Neural Networks (ICANN), 2022 (Xiamen University). [Paper][PyTorch (in construction)]
  • MIL: "Transformer based multiple instance learning for weakly supervised histopathology image segmentation", MICCAI, 2022 (Beihang University). [Paper]
  • mmFormer: "mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation", MICCAI, 2022 (CAS). [Paper][PyTorch]
  • Patcher: "Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation", MICCAI, 2022 (Pennsylvania State University). [Paper]
  • NestedFormer: "NestedFormer: Nested Modality-Aware Transformer for Brain Tumor Segmentation", MICCAI, 2022 (Tianjin University). [Paper][Code (in construction)]
  • TransDeepLab: "TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation", MICCAIW, 2022 (RWTH Aachen University, Germany). [Paper][PyTorch]
  • Video-TransUNet: "Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation", International Conference on Machine Vision (ICMV), 2022 (University of Bristol, UK). [Paper]
  • CASTformer: "Class-Aware Adversarial Transformers for Medical Image Segmentation", NeurIPS, 2022 (Yale). [Paper]
  • Tempera: "Tempera: Spatial Transformer Feature Pyramid Network for Cardiac MRI Segmentation", arXiv, 2022 (ICL). [Paper]
  • UTNetV2: "A Multi-scale Transformer for Medical Image Segmentation: Architectures, Model Efficiency, and Benchmarks", arXiv, 2022 (Rutgers). [Paper]
  • UNesT: "Characterizing Renal Structures with 3D Block Aggregate Transformers", arXiv, 2022 (Vanderbilt University, Tennessee). [Paper]
  • PHTrans: "PHTrans: Parallelly Aggregating Global and Local Representations for Medical Image Segmentation", arXiv, 2022 (Beijing University of Posts and Telecommunications). [Paper]
  • UNeXt: "UNeXt: MLP-based Rapid Medical Image Segmentation Network", arXiv, 2022 (JHU). [Paper][PyTorch]
  • TransFusion: "TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers", arXiv, 2022 (Rutgers). [Paper]
  • UNetFormer: "UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation", arXiv, 2022 (NVIDIA). [Paper][GitHub]
  • 3D-Shuffle-Mixer: "3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume", arXiv, 2022 (Xi'an Jiaotong University). [Paper]
  • ?: "Continual Hippocampus Segmentation with Transformers", arXiv, 2022 (Technical University of Darmstadt, Germany). [Paper]
  • TranSiam: "TranSiam: Fusing Multimodal Visual Features Using Transformer for Medical Image Segmentation", arXiv, 2022 (Tianjin University). [Paper]
  • ColonFormer: "ColonFormer: An Efficient Transformer based Method for Colon Polyp Segmentation", arXiv, 2022 (Hanoi University of Science and Technology). [Paper]
  • ?: "Transformer based Generative Adversarial Network for Liver Segmentation", arXiv, 2022 (Northwestern University). [Paper]
  • FCT: "The Fully Convolutional Transformer for Medical Image Segmentation", arXiv, 2022 (University of Glasgow, UK). [Paper]
  • XBound-Former: "XBound-Former: Toward Cross-scale Boundary Modeling in Transformers", arXiv, 2022 (Xiamen University). [Paper][PyTorch]
  • Polyp-PVT: "Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers", arxiv, 2022 (IIAI). [Paper][PyTorch]
  • SeATrans: "SeATrans: Learning Segmentation-Assisted diagnosis model via Transformer", arXiv, 2022 (Baidu). [Paper]
  • TransResU-Net: "TransResU-Net: Transformer based ResU-Net for Real-Time Colonoscopy Polyp Segmentation", arXiv, 2022 (Indira Gandhi National Open University). [Paper][Code (in construction)]
  • LViT: "LViT: Language meets Vision Transformer in Medical Image Segmentation", arXiv, 2022 (Alibaba). [Paper][Code (in construction)]
  • APFormer: "The Lighter The Better: Rethinking Transformers in Medical Image Segmentation Through Adaptive Pruning", arXiv, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
  • ?: "Transformer based Models for Unsupervised Anomaly Segmentation in Brain MR Images", arXiv, 2022 (University of Rennes, France). [Paper][Tensorflow]
  • CKD-TransBTS: "CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation", arXiv, 2022 (South China University of Technology). [Paper]
  • HiFormer: "HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation", arXiv, 2022 (Iran University of Science and Technology). [Paper][PyTorch]
  • ?: "Contextual Attention Network: Transformer Meets U-Net", arXiv, 2022 (RWTH Aachen University). [Paper][PyTorch]
  • HRSTNet: "High-Resolution Swin Transformer for Automatic Medical Image Segmentation", arXiv, 2022 (Xi'an University of Posts and Telecommunications). [Paper][Code (in construction)]
  • TransNorm: "TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism for a Deep Segmentation Model", arXiv, 2022 (Aachen University, Germany). [Paper][PyTorch]
  • ?: "When CNN Meet with ViT: Towards Semi-Supervised Learning for Multi-Class Medical Image Semantic Segmentation", arXiv, 2022 (Oxford). [Paper][Code (in construction)]
  • CM-MLP: "CM-MLP: Cascade Multi-scale MLP with Axial Context Relation Encoder for Edge Segmentation of Medical Image", arXiv, 2022 (Zhengzhou University). [Paper]
  • CATS: "Cats: Complementary CNN and Transformer Encoders for Segmentation", arXiv, 2022 (Vanderbilt University, Nashville). [Paper]
  • TFusion: "TFusion: Transformer based N-to-One Multimodal Fusion Block", arXiv, 2022 (SouthChinaUniversityofTechnology). [Paper]
  • AutoPET: "AutoPET Challenge: Combining nn-Unet with Swin UNETR Augmented by Maximum Intensity Projection Classifier", arXiv, 2022 (University Hospital Essen, Germany). [Paper]
  • SPAN: "Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers", arXiv, 2022 (Berkeley). [Paper]
  • TMSS: "TMSS: An End-to-End Transformer-based Multimodal Network for Segmentation and Survival Prediction", arXiv, 2022 (MBZUAI). [Paper]
  • CR-Swin2-VT: "Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation", arXiv, 2022 (Monash University). [Paper][PyTorch]
  • 3DUX-Net: "3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation", arXiv, 2022 (Vanderbilt University). [Paper][PyTorch]
  • FocalUNETR: "FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images", arXiv, 2022 (Wayne State University, Detroit). [Paper]
  • LAPFormer: "LAPFormer: A Light and Accurate Polyp Segmentation Transformer", arXiv, 2022 (Sun*, Hanoi). [Paper]
  • FINE: "Memory transformers for full context and high-resolution 3D Medical Segmentation", arXiv, 2022 (National Conservatory of Arts and Crafts, France). [Paper]
  • ConvTransSeg: "ConvTransSeg: A Multi-resolution Convolution-Transformer Network for Medical Image Segmentation", arXiv, 2022 (University of Nottingham, UK). [Paper]
  • CS-Unet: "Optimizing Vision Transformers for Medical Image Segmentation and Few-Shot Domain Adaptation", arXiv, 2022 (University of Glasgow, UK). [Paper]
  • UNETR++: "UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation", arXiv, 2022 (MBZUAI). [Paper][PyTorch]

[Back to Overview]

Medical Classification

  • COVID19T: "A Transformer-Based Framework for Automatic COVID19 Diagnosis in Chest CTs", ICCVW, 2021 (?). [Paper][PyTorch]
  • TransMIL: "TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication", NeurIPS, 2021 (Tsinghua University). [Paper][PyTorch]
  • TransMed: "TransMed: Transformers Advance Multi-modal Medical Image Classification", arXiv, 2021 (Northeastern University). [Paper]
  • CXR-ViT: "Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification", arXiv, 2021 (KAIST). [Paper]
  • ViT-TSA: "Shoulder Implant X-Ray Manufacturer Classification: Exploring with Vision Transformer", arXiv, 2021 (Queen’s University). [Paper]
  • GasHis-Transformer: "GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathology Image Classification", arXiv, 2021 (Northeastern University). [Paper]
  • POCFormer: "POCFormer: A Lightweight Transformer Architecture for Detection of COVID-19 Using Point of Care Ultrasound", arXiv, 2021 (The Ohio State University). [Paper]
  • COVID-ViT: "COVID-VIT: Classification of COVID-19 from CT chest images based on vision transformer models", arXiv, 2021 (Middlesex University, UK). [Paper][PyTorch]
  • EEG-ConvTransformer: "EEG-ConvTransformer for Single-Trial EEG based Visual Stimuli Classification", arXiv, 2021 (IIT Ropar). [Paper]
  • CCAT: "Visual Transformer with Statistical Test for COVID-19 Classification", arXiv, 2021 (NCKU). [Paper]
  • M3T: "M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer", CVPR, 2022 (Yonsei University). [Paper]
  • ?: "A comparative study between vision transformers and CNNs in digital pathology", CVPRW, 2022 (Roche, Switzerland). [Paper]
  • SCT: "Context-Aware Transformers For Spinal Cancer Detection and Radiological Grading", MICCAI, 2022 (Oxford). [Paper]
  • KAT: "Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification", MICCAI, 2022 (Beihang University). [Paper][PyTorch]
  • SEViT: "Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification", MICCAI, 2022 (MBZUAI). [Paper][PyTorch]
  • MF-ViT: "Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis", MICCAIW, 2022 (Rutgers University). [Paper][PyTorch]
  • SB-SSL: "SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI", MICCAIW, 2022 (University of Surrey, UK). [Paper]
  • RadioTransformer: "RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention-guided Disease Classification", ECCV, 2022 (Stony Brook). [Paper][Tensorflow (in construction)]
  • ScoreNet: "ScoreNet: Learning Non-Uniform Attention and Augmentation for Transformer-Based Histopathological Image Classification", arXiv, 2022 (EPFL). [Paper]
  • LA-MIL: "Local Attention Graph-based Transformer for Multi-target Genetic Alteration Prediction", arXiv, 2022 (TUM). [Paper]
  • HoVer-Trans: "HoVer-Trans: Anatomy-aware HoVer-Transformer for ROI-free Breast Cancer Diagnosis in Ultrasound Images", arXiv, 2022 (South China University of Technology). [Paper]
  • GTP: "A graph-transformer for whole slide image classification", arXiv, 2022 (Boston University). [Paper]
  • ?: "Zero-Shot and Few-Shot Learning for Lung Cancer Multi-Label Classification using Vision Transformer", arXiv, 2022 (Harvard). [Paper]
  • SwinCheX: "SwinCheX: Multi-label classification on chest X-ray images with transformers", arXiv, 2022 (Sharif University of Technology, Iran). [Paper]
  • SGT: "Rectify ViT Shortcut Learning by Visual Saliency", arXiv, 2022 (Northwestern Polytechnical University, China). [Paper]
  • IPMN-ViT: "Neural Transformers for Intraductal Papillary Mucosal Neoplasms (IPMN) Classification in MRI images", arXiv, 2022 (University of Catania, Italy). [Paper]
  • ?: "Multi-Label Retinal Disease Classification using Transformers", arXiv, 2022 (Khalifa University, UAE). [Paper][PyTorch]
  • TractoFormer: "TractoFormer: A Novel Fiber-level Whole Brain Tractography Analysis Framework Using Spectral Embedding and Vision Transformers", arXiv, 2022 (Harvard). [Paper]
  • BrainFormer: "BrainFormer: A Hybrid CNN-Transformer Model for Brain fMRI Data Classification", arXiv, 2022 (Chinese PLA General Hospital). [Paper]
  • SI-ViT: "Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification", arXiv, 2022 (Beihang University). [Paper][PyTorch]

[Back to Overview]

Medical Detection

  • COTR: "COTR: Convolution in Transformer Network for End to End Polyp Detection", arXiv, 2021 (Fuzhou University). [Paper]
  • TR-Net: "Transformer Network for Significant Stenosis Detection in CCTA of Coronary Arteries", arXiv, 2021 (Harbin Institute of Technology). [Paper]
  • CAE-Transformer: "CAE-Transformer: Transformer-based Model to Predict Invasiveness of Lung Adenocarcinoma Subsolid Nodules from Non-thin Section 3D CT Scans", arXiv, 2021 (Concordia University, Canada). [Paper]
  • DATR: "DATR: Domain-adaptive transformer for multi-domain landmark detection", arXiv, 2022 (CAS). [Paper]
  • SATr: "SATr: Slice Attention with Transformer for Universal Lesion Detection", arXiv, 2022 (CAS). [Paper]
  • Focused-Decoder: "Focused Decoding Enables 3D Anatomical Detection by Transformers", arXiv, 2022 (TUM). [Paper][PyTorch]

[Back to Overview]

Medical Reconstruction

  • T2Net: "Task Transformer Network for Joint MRI Reconstruction and Super-Resolution", MICCAI, 2021 (Harbin Institute of Technology). [Paper][PyTorch]
  • FIT: "Fourier Image Transformer", arXiv, 2021 (MPI). [Paper][PyTorch]
  • SLATER: "Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers", arXiv, 2021 (Bilkent University). [Paper]
  • MTrans: "MTrans: Multi-Modal Transformer for Accelerated MR Imaging", arXiv, 2021 (Harbin Institute of Technology). [Paper][PyTorch]
  • SDAUT: "Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI", MICCAI, 2022 (ICL). [Paper]
  • ?: "Adaptively Re-weighting Multi-Loss Untrained Transformer for Sparse-View Cone-Beam CT Reconstruction", arXiv, 2022 (Zhejiang Lab). [Paper]
  • K-Space-Transformer: "K-Space Transformer for Fast MRI Reconstruction with Implicit Representation", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][Code (in construction)][Website]
  • McSTRA: "Multi-head Cascaded Swin Transformers with Attention to k-space Sampling Pattern for Accelerated MRI Reconstruction", arXiv, 2022 (Monash University, Australia). [Paper]
  • ?: "Colonoscopy Landmark Detection using Vision Transformers", arXiv, 2022 (Intuitive Surgical, CA). [Paper]

[Back to Overview]

Medical Low-Level Vision

  • Eformer: "Eformer: Edge Enhancement based Transformer for Medical Image Denoising", ICCV, 2021 (BITS Pilani, India). [Paper]
  • PTNet: "PTNet: A High-Resolution Infant MRI Synthesizer Based on Transformer", arXiv, 2021 (* Columbia *). [Paper]
  • ResViT: "ResViT: Residual vision transformers for multi-modal medical image synthesis", arXiv, 2021 (Bilkent University, Turkey). [Paper]
  • CyTran: "CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation", arXiv, 2021 (University Politehnica of Bucharest, Romania). [Paper][PyTorch]
  • McMRSR: "Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution", CVPR, 2022 (Yantai University, China). [Paper][PyTorch]
  • RPLHR-CT: "RPLHR-CT Dataset and Transformer Baseline for Volumetric Super-Resolution from CT Scans", MICCAI, 2022 (Infervision Medical Technology, China). [Paper][Code (in construction)]
  • W-G2L-ART: "Wide Range MRI Artifact Removal with Transformers", BMVC, 2022 (KTH). [Paper]
  • RFormer: "RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark", arXiv, 2022 (Tsinghua). [Paper]
  • CTformer: "CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising", arXiv, 2022 (UMass Lowell). [Paper][PyTorch]
  • Cohf-T: "Cross-Modality High-Frequency Transformer for MR Image Super-Resolution", arXiv, 2022 (Xidian University). [Paper]
  • SIST: "Low-Dose CT Denoising via Sinogram Inner-Structure Transformer", arXiv, 2022 (?). [Paper]
  • Spach-Transformer: "Spach Transformer: Spatial and Channel-wise Transformer Based on Local and Global Self-attentions for PET Image Denoising", arXiv, 2022 (Harvard). [Paper]
  • ConvFormer: "ConvFormer: Combining CNN and Transformer for Medical Image Segmentation", arXiv, 2022 (University of Notre Dame). [Paper]

[Back to Overview]

Medical Vision-Language

  • CGT: "Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation", CVPR, 2022 (University of Technology Sydney). [Paper]
  • MCGN: "A Medical Semantic-Assisted Transformer for Radiographic Report Generation", MICCAI, 2022 (University of Sydney). [Paper]
  • M3AE: "Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training", MICCAI, 2022 (CUHK). [Paper][PyTorch]
  • BioViL: "Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing", ECCV, 2022 (Microsoft). [Paper][Code]
  • MGCA: "Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning", NeurIPS, 2022 (HKU). [Paper]
  • MedCLIP: "MedCLIP: Contrastive Learning from Unpaired Medical Images and Text", EMNLP, 2022 (UIUC). [Paper][PyTorch]
  • MDBERT: "Hierarchical BERT for Medical Document Understanding", arXiv, 2022 (IQVIA, NC). [Paper]
  • Surgical-VQA: "Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer", arXiv, 2022 (NUS). [Paper][PyTorch (in construction)]
  • SwinMLP-TranCAP: "Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches", arXiv, 2022 (CUHK). [Paper][PyTorch]
  • SAT: "Medical Image Captioning via Generative Pretrained Transformers", arXiv, 2022 (Philips Innovation Labs Rus, Russia). [Paper]
  • RepsNet: "RepsNet: Combining Vision with Language for Automated Medical Reports", arXiv, 2022 (Google). [Paper][Website]
  • MF2-MVQA: "MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering", arXiv, 2022 (University of Science and Technology Beijing). [Paper]
  • RoentGen: "RoentGen: Vision-Language Foundation Model for Chest X-ray Generation", arXiv, 2022 (Stanford). [Paper]

[Back to Overview]

Medical Others

  • LAT: "Lesion-Aware Transformers for Diabetic Retinopathy Grading", CVPR, 2021 (USTC). [Paper]
  • UVT: "Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation", MICCAI, 2021 (ICL). [Paper][PyTorch]
  • ?: "Surgical Instruction Generation with Transformers", MICCAI, 2021 (Bournemouth University, UK). [Paper]
  • AlignTransformer: "AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation", MICCAI, 2021 (Peking University). [Paper]
  • MCAT: "Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images", ICCV, 2021 (Harvard). [Paper][PyTorch]
  • ?: "Is it Time to Replace CNNs with Transformers for Medical Images?", ICCVW, 2021 (KTH, Sweden). [Paper]
  • HAT-Net: "HAT-Net: A Hierarchical Transformer Graph Neural Network for Grading of Colorectal Cancer Histology Images", BMVC, 2021 (Beijing University of Posts and Telecommunications). [Paper]
  • ?: "Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training", NeurIPS, 2021 (KAIST). [Paper]
  • ViT-Path: "Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology", NeurIPSW, 2021 (Microsoft). [Paper]
  • Global-Local-Transformer: "Global-Local Transformer for Brain Age Estimation", IEEE Transactions on Medical Imaging, 2021 (Harvard). [Paper][PyTorch]
  • CE-TFE: "Deep Transformers for Fast Small Intestine Grounding in Capsule Endoscope Video", arXiv, 2021 (Sun Yat-Sen University). [Paper]
  • DeepProg: "DeepProg: A Transformer-based Framework for Predicting Disease Prognosis", arXiv, 2021 (University of Oulu). [Paper]
  • Medical-Transformer: "Medical Transformer: Universal Brain Encoder for 3D MRI Analysis", arXiv, 2021 (Korea University). [Paper]
  • RATCHET: "RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting", arXiv, 2021 (ICL). [Paper]
  • C2FViT: "Affine Medical Image Registration with Coarse-to-Fine Vision Transformer", CVPR, 2022 (HKUST). [Paper][Code (in construction)]
  • HIPT: "Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning", CVPR, 2022 (Harvard). [Paper]
  • SiT: "Surface Analysis with Vision Transformers", CVPRW, 2022 (King’s College London, UK). [Paper][PyTorch]
  • SiT: "Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis", Medical Imaging with Deep Learning (MIDL), 2022 (King’s College London, UK). [Paper]
  • ViT-V-Net: "ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration", ICML, 2022 (JHU). [Paper][PyTorch]
  • HybridStereoNet: "Deep Laparoscopic Stereo Matching with Transformers", MICCAI, 2022 (Monash University, Australia). [Paper][PyTorch]
  • BabyNet: "BabyNet: Residual Transformer Module for Birth Weight Prediction on Fetal Ultrasound Video", MICCAI, 2022 (Sano Centre for Computational Medicine, Poland). [Paper][PyTorch]
  • TLT: "Transformer Lesion Tracker", MICCAI, 2022 (InferVision Medical Technology, China). [Paper]
  • XMorpher: "XMorpher: Full Transformer for Deformable Medical Image Registration via Cross Attention", MICCAI, 2022 (Southeast University, China). [Paper][PyTorch]
  • SVoRT: "SVoRT: Iterative Transformer for Slice-to-Volume Registration in Fetal Brain MRI", MICCAI, 2022 (MIT). [Paper]
  • GaitForeMer: "GaitForeMer: Self-Supervised Pre-Training of Transformers via Human Motion Forecasting for Few-Shot Gait Impairment Severity Estimation", MICCAI, 2022 (Stanford). [Paper][PyTorch]
  • LKU-Net: "U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?", MICCAIW, 2022 (University of Birmingham, UK). [Paper]
  • LVOT: "Shifted Windows Transformers for Medical Image Quality Assessment", MICCAIW, 2022 (Istanbul Technical University, Turkey). [Paper]
  • MINiT: "Multiple Instance Neuroimage Transformer", MICCAIW, 2022 (Stanford). [Paper][Code (in construction)]
  • BrainNetTF: "Brain Network Transformer", NeurIPS, 2022 (Emory University). [Paper][PyTorch]
  • SiT: "Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces", arXiv, 2022 (King’s College London, UK). [Paper][PyTorch]
  • TransMorph: "TransMorph: Transformer for unsupervised medical image registration", arXiv, 2022 (JHU). [Paper]
  • SymTrans: "Symmetric Transformer-based Nwholeetwork for Unsupervised Image Registration", arXiv, 2022 (Jilin University). [Paper]
  • MMT: "One Model to Synthesize Them All: Multi-contrast Multi-scale Transformer for Missing Data Imputation", arXiv, 2022 (JHU). [Paper]
  • EG-ViT: "Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning", arXiv, 2022 (Northwestern Polytechnical University). [Paper]
  • CSM: "Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection", arXiv, 2022 (University of Adelaide, Australia). [Paper]
  • CASHformer: "CASHformer: Cognition Aware SHape Transformer for Longitudinal Analysis", arXiv, 2022 (TUM). [Paper]
  • ARST: "ARST: Auto-Regressive Surgical Transformer for Phase Recognition from Laparoscopic Videos", arXiv, 2022 (Shanghai Jiao Tong University). [Paper]
  • SSiT: "SSiT: Saliency-guided Self-supervised Image Transformer for Diabetic Retinopathy Grading", arXiv, 2022 (Southern University of Science and Techonology, China). [Paper][Code (in construction)]

[Back to Overview]

Other Tasks

  • Active Learning:
    • TJLS: "Visual Transformer for Task-aware Active Learning", arXiv, 2021 (ICL). [Paper][PyTorch]
  • Agriculture:
    • PlantXViT: "Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT", arXiv, 2922 (Indian Institute of Information Technology). [Paper]
  • Animation-related:
    • AnT: "The Animation Transformer: Visual Correspondence via Segment Matching", ICCV, 2021 (Cadmium). [Paper]
    • AniFormer: "AniFormer: Data-driven 3D Animation with Transformer", BMVC, 2021 (University of Oulu, Finland). [Paper][PyTorch]
  • Biology:
    • ?: "A State-of-the-art Survey of Object Detection Techniques in Microorganism Image Analysis: from Traditional Image Processing and Classical Machine Learning to Current Deep Convolutional Neural Networks and Potential Visual Transformers", arXiv, 2021 (Northeastern University). [Paper]
  • Brain Score:
    • CrossViT: "Joint rotational invariance and adversarial training of a dual-stream Transformer yields state of the art Brain-Score for Area V4", CVPRW, 2022 (MIT). [Paper][PyTorch]
  • Camera-related:
    • CTRL-C: "CTRL-C: Camera calibration TRansformer with Line-Classification", ICCV, 2021 (Kakao + Kookmin University). [Paper][PyTorch]
    • MS-Transformer: "Learning Multi-Scene Absolute Pose Regression with Transformers", ICCV, 2021 (Bar-Ilan University, Israel). [Paper][PyTorch]
    • GTCaR: "GTCaR: Graph Transformer for Camera Re-localization", ECCV, 2022 (Magic Leap). [Paper]
  • Character/Text Recognition:
    • BTTR: "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer", arXiv, 2021 (Peking). [Paper]
    • TrOCR: "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models", arXiv, 2021 (Microsoft). [Paper][PyTorch]
    • ?: "Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks", arXiv, 2021 (Salesforce). [Paper]
    • T3: "TrueType Transformer: Character and Font Style Recognition in Outline Format", Document Analysis Systems (DAS), 2022 (Kyushu University). [Paper]
    • ?: "Transformer-based HTR for Historical Documents", ComHum, 2022 (University of Zurich, Switzerland). [Paper]
    • ?: "SVG Vector Font Generation for Chinese Characters with Transformer", ICIP, 2022 (The University of Tokyo). [Paper]
    • LP-Transformer: "Forensic License Plate Recognition with Compression-Informed Transformers", ICIP, 2022 (University of Erlangen-Nurnberg, Germany). [Paper]
    • CoMER: "CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition", ECCV, 2022 (Peking University). [Paper][PyTorch]
    • MATRN: "Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features", ECCV, 2022 (KAIST). [Paper][PyTorch]
    • CONSENT: "CONSENT: Context Sensitive Transformer for Bold Words Classification", arXiv, 2022 (Amazon). [Paper]
  • Curriculum Learning:
    • SSTN: "Spatial Transformer Networks for Curriculum Learning", arXiv, 2021 (TU Kaiserslautern, Germany). [Paper]
  • Defect Classification:
    • MSHViT: "Multi-Scale Hybrid Vision Transformer and Sinkhorn Tokenizer for Sewer Defect Classification", CVPRW, 2022 (Aalborg University, Denmark). [Paper]
    • DefT: "Defect Transformer: An Efficient Hybrid Transformer Architecture for Surface Defect Detection", arXiv, 2022 (Nanjing University of Aeronautics and Astronautics). [Paper]
  • Digital Holography:
    • ?: "Convolutional Neural Network (CNN) vs Visual Transformer (ViT) for Digital Holography", ICCCR, 2022 (UBFC, France). [Paper]
  • Disentangled representation:
    • VCT: "Visual Concepts Tokenization", NeurIPS, 2022 (Microsoft). [Paper][PyTorch]
  • E-Commerce:
    • WebShop: "WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents", NeurIPS, 2022 (Princeton). [Paper][PyTorch][Website]
  • Event data:
    • EvT: "Event Transformer: A sparse-aware solution for efficient event data processing", arXiv, 2022 (Universidad de Zaragoza, Spain). [Paper][PyTorch]
    • ETB: "Event Transformer", arXiv, 2022 (Nanjing University). [Paper]
    • RVT: "Recurrent Vision Transformers for Object Detection with Event Cameras", arXiv, 2022 (University of Zurich). [Paper]
  • Fashion:
    • Kaleido-BERT: "Kaleido-BERT: Vision-Language Pre-training on Fashion Domain", CVPR, 2021 (Alibaba). [Paper][Tensorflow]
    • CIT: "Cloth Interactive Transformer for Virtual Try-On", arXiv, 2021 (University of Trento). [Paper][Code (in construction)]
    • ClothFormer: "ClothFormer: Taming Video Virtual Try-on in All Module", CVPR, 2022 (iQIYI). [Paper][Website]
    • FashionVLP: "FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback", CVPR, 2022 (Amazon). [Paper]
    • FashionViL: "FashionViL: Fashion-Focused Vision-and-Language Representation Learning", ECCV, 2022 (University of Surrey, UK). [Paper][PyTorch]
    • OutfitTransformer: "OutfitTransformer: Learning Outfit Representations for Fashion Recommendation", arXiv, 2022 (Amazon). [Paper]
    • Fashionformer: "Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition", ECCV, 2022 (Peking). [Paper][PyTorch]
    • MVLT: "Masked Vision-Language Transformer in Fashion", Machine Intelligence Research, 2023 (Alibaba). [Paper][PyTorch]
  • Feature Matching:
    • SuperGlue: "SuperGlue: Learning Feature Matching with Graph Neural Networks", CVPR, 2020 (Magic Leap). [Paper][PyTorch]
    • LoFTR: "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR, 2021 (Zhejiang University). [Paper][PyTorch][Website]
    • COTR: "COTR: Correspondence Transformer for Matching Across Images", ICCV, 2021 (UBC). [Paper]
    • CATs: "CATs: Cost Aggregation Transformers for Visual Correspondence", NeurIPS, 2021 (Yonsei University + Korea University). [Paper][PyTorch][Website]
    • TransforMatcher: "TransforMatcher: Match-to-Match Attention for Semantic Correspondence", CVPR, 2022 (POSTECH). [Paper]
    • ASpanFormer: "ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer", ECCV, 2022 (HKUST). [Paper][Website]
    • CATs++: "CATs++: Boosting Cost Aggregation with Convolutions and Transformers", arXiv, 2022 (Korea University). [Paper]
    • LoFTR-TensorRT: "Local Feature Matching with Transformers for low-end devices", arXiv, 2022 (?). [Paper][PyTorch]
    • MatchFormer: "MatchFormer: Interleaving Attention in Transformers for Feature Matching", arXiv, 2022 (Karlsruhe Institute of Technology, Germany). [Paper]
    • OpenGlue: "OpenGlue: Open Source Graph Neural Net Based Pipeline for Image Matching", arXiv, 2022 (Ukrainian Catholic University). [Paper][PyTorch]
  • Fine-grained:
    • ViT-FGVC: "Exploring Vision Transformers for Fine-grained Classification", CVPRW, 2021 (Universidad de Valladolid). [Paper]
    • FFVT: "Feature Fusion Vision Transformer for Fine-Grained Visual Categorization", BMVC, 2021 (Griffith University, Australia). [Paper][PyTorch]
    • TPSKG: "Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition", arXiv, 2021 (Beihang University). [Paper]
    • AFTrans: "A free lunch from ViT: Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition", arXiv, 2021 (Peking University). [Paper]
    • TransFG: "TransFG: A Transformer Architecture for Fine-grained Recognition", AAAI, 2022 (Johns Hopkins). [Paper][PyTorch]
    • DynamicMLP: "Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information", CVPR, 2022 (Megvii). [Paper][PyTorch]
    • SIM-Trans: "SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization", ACMMM, 2022 (Peking University). [Paper][PyTorch]
    • MetaFormer: "MetaFormer: A Unified Meta Framework for Fine-Grained Recognition", arXiv, 2022 (ByteDance). [Paper][PyTorch]
    • ViT-FOD: "ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator", arXiv, 2022 (Shandong University). [Paper]
  • Gait:
    • Gait-TR: "Spatial Transformer Network on Skeleton-based Gait Recognition", arXiv, 2022 (South China University of Technology). [Paper]
  • Gaze:
    • GazeTR: "Gaze Estimation using Transformer", arXiv, 2021 (Beihang University). [Paper][PyTorch]
    • HGTTR: "End-to-End Human-Gaze-Target Detection with Transformers", CVPR, 2022 (Shanghai Jiao Tong). [Paper]
    • MGTR: "MGTR: End-to-End Mutual Gaze Detection with Transformer", ACCV, 2022 (Nankai University). [Paper][PyTorch]
    • GLC: "In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation", arXiv, 2022 (Georgia Tech). [Paper][Website]
  • Geo-Localization:
    • EgoTR: "Cross-view Geo-localization with Evolving Transformer", arXiv, 2021 (Shenzhen University). [Paper]
    • TransGeo: "TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization", CVPR, 2022 (UCF). [Paper][PyTorch]
    • GAMa: "GAMa: Cross-view Video Geo-localization", ECCV, 2022 (UCF). [Paper][Code (in construction)]
    • TransLocator: "Where in the World is this Image? Transformer-based Geo-localization in the Wild", ECCV, 2022 (JHU). [Paper]
    • TransGCNN: "Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization", arXiv, 2022 (Southeast University, China). [Paper]
    • MGTL: "Mutual Generative Transformer Learning for Cross-view Geo-localization", arXiv, 2022 (University of Electronic Science and Technology of China). [Paper]
  • Homography Estimation:
    • LocalTrans: "LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation", ICCV, 2021 (Tsinghua). [Paper]
  • Image Registration:
    • AiR: "Attention for Image Registration (AiR): an unsupervised Transformer approach", arXiv, 2021 (INRIA). [Paper]
  • Image Retrieval:
    • RRT: "Instance-level Image Retrieval using Reranking Transformers", ICCV, 2021 (University of Virginia). [Paper][PyTorch]
    • SwinFGHash: "SwinFGHash: Fine-grained Image Retrieval via Transformer-based Hashing Network", BMVC, 2021 (Tsinghua). [Paper]
    • ViT-Retrieval: "Investigating the Vision Transformer Model for Image Retrieval Tasks", arXiv, 2021 (Democritus University of Thrace). [Paper]
    • IRT: "Training Vision Transformers for Image Retrieval", arXiv, 2021 (Facebook + INRIA). [Paper]
    • TransHash: "TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval", arXiv, 2021 (Shanghai Jiao Tong University). [Paper]
    • VTS: "Vision Transformer Hashing for Image Retrieval", arXiv, 2021 (IIIT-Allahabad). [Paper]
    • GTZSR: "Zero-Shot Sketch Based Image Retrieval using Graph Transformer", arXiv, 2022 (IIT Bombay). [Paper]
    • EViT: "EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing", arXiv, 2022 (Jinan University). [Paper][PyTorch (in construction)]
    • ?: "Transformers and CNNs both Beat Humans on SBIR", arXiv, 2022 (University of Mons, Belgium). [Paper]
    • ?: "A Light Touch Approach to Teaching Transformers Multi-view Geometry", arXiv, 2022 (Oxford). [Paper]
    • DToP: "Boosting vision transformers for image retrieval", WACV, 2023 (Dealicious, Korea). [Paper][Code (in construction)]
  • Layout Generation:
    • VTN: "Variational Transformer Networks for Layout Generation", CVPR, 2021 (Google). [Paper]
    • LayoutTransformer: "LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity", CVPR, 2021 (NTU). [Paper][PyTorch]
    • LayoutTransformer: "LayoutTransformer: Layout Generation and Completion with Self-attention", ICCV, 2021 (Amazon). [Paper][Website]
    • LGT-Net: "LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network", CVPR, 2022 (East China Normal University). [Paper][PyTorch]
    • CADTransformer: "CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings", CVPR, 2022 (UT Austin). [Paper]
    • GAT-CADNet: "GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings", CVPR, 2022 (TUM + Alibaba). [Paper]
    • LayoutBERT: "LayoutBERT: Masked Language Layout Model for Object Insertion", CVPRW, 2022 (Adobe). [Paper]
    • ICVT: "Geometry Aligned Variational Transformer for Image-conditioned Layout Generation", ACMMM, 2022 (Alibaba). [Paper]
    • BLT: "BLT: Bidirectional Layout Transformer for Controllable Layout Generation", ECCV, 2022 (Google). [Paper][Tensorflow][Website]
    • ATEK: "ATEK: Augmenting Transformers with Expert Knowledge for Indoor Layout Synthesis", arXiv, 2022 (New Jersey Institute of Technology). [Paper]
    • ?: "Extreme Floorplan Reconstruction by Structure-Hallucinating Transformer Cascades", arXiv, 2022 (Simon Fraser). [Paper]
    • UniLayout: "UniLayout: Taming Unified Sequence-to-Sequence Transformers for Graphic Layout Generation", arXiv, 2022 (Microsoft). [Paper]
  • Livestock Monitoring:
    • STARFormer: "Livestock Monitoring with Transformer", BMVC, 2021 (IIT Dhanbad). [Paper]
  • Metric Learning:
    • Hyp-ViT: "Hyperbolic Vision Transformers: Combining Improvements in Metric Learning", CVPR, 2022 (University of Trento, Italy). [Paper][PyTorch]
    • BGFormer: "Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach", arXiv, 2022 (Anhui University). [Paper]
  • Multi-Input:
    • MixViT: "Adapting Multi-Input Multi-Output schemes to Vision Transformers", CVPRW, 2022 (Sorbonne Universite, France). [Paper]
  • Multi-label:
    • C-Tran: "General Multi-label Image Classification with Transformers", CVPR, 2021 (University of Virginia). [Paper]
    • TDRG: "Transformer-Based Dual Relation Graph for Multi-Label Image Recognition", ICCV, 2021 (Tencent). [Paper]
    • MlTr: "MlTr: Multi-label Classification with Transformer", arXiv, 2021 (KuaiShou). [Paper]
    • GATN: "Graph Attention Transformer Network for Multi-Label Image Classification", arXiv, 2022 (Southeast University, China). [Paper]
  • Multi-task:
    • MulT: "MulT: An End-to-End Multitask Learning Transformer", CVPR, 2022 (EPFL). [Paper]
  • Open Set:
    • OSR-ViT: "Open Set Recognition using Vision Transformer with an Additional Detection Head", arXiv, 2022 (Vanderbilt University, Tennessee). [Paper]
  • Out-Of-Distribution:
    • OODformer: "OODformer: Out-Of-Distribution Detection Transformer", BMVC, 2021 (LMU Munich). [Paper][PyTorch]
    • MCM: "Delving into Out-of-Distribution Detection with Vision-Language Representations", NeurIPS, 2022 (UW-Madison). [Paper]
  • Pedestrian Intention:
    • IntFormer: "IntFormer: Predicting pedestrian intention with the aid of the Transformer architecture", arXiv, 2021 (Universidad de Alcala). [Paper]
  • Physics Simulation:
    • TIE: "Transformer with Implicit Edges for Particle-based Physics Simulation", ECCV, 2022 (NTU, Singapore). [Paper][PyTorch][Website]
  • Place Recognition:
    • SVT-Net: "SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers", AAAI, 2022 (Renmin University of China). [Paper]
    • TransVPR: "TransVPR: Transformer-based place recognition with multi-level attention aggregation", CVPR, 2022 (Xi'an Jiaotong). [Paper]
    • OverlapTransformer: "OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition", IROS, 2022 (HAOMO.AI, China). [Paper][PyTorch]
    • SeqOT: "SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data", arXiv, 2022 (National University of Defense Technology, China). [Paper][PyTorch]
  • Remote Sensing/Hyperspectral/Satellite:
    • DCFAM: "Transformer Meets DCFAM: A Novel Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images", arXiv, 2021 (Wuhan University). [Paper]
    • WiCNet: "Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images", arXiv, 2021 (University of Trento). [Paper]
    • ?: "Vision Transformers For Weeds and Crops Classification Of High Resolution UAV Images", arXiv, 2021 (University of Orleans, France). [Paper]
    • Satellite-ViT: "Manipulation Detection in Satellite Images Using Vision Transformer", arXiv, 2021 (Purdue). [Paper]
    • ?: "Self-supervised Vision Transformers for Joint SAR-optical Representation Learning", IGARSS, 2022 (German Aerospace Center). [Paper]
    • VBFusion: "Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing", SPIE Remote Sensing, 2022 (Technische Universitat Berlin, Germany). [Paper][PyTorch]
    • SatMAE: "SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery", NeurIPS, 2022 (Stanford). [Paper]
    • ANDT: "Anomaly Detection in Aerial Videos with Transformers", IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2022 (TUM). [Paper]
    • RNGDet: "RNGDet: Road Network Graph Detection by Transformer in Aerial Images", arXiv, 2022 (HKUST). [Paper]
    • FSRA: "A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization", arXiv, 2022 (China Jiliang University). [Paper][PyTorch]
    • ?: "Multiscale Convolutional Transformer with Center Mask Pretraining for Hyperspectral Imag (e Cl)assificationtion", arXiv, 2022 (Shenzhen University). [Paper]
    • ?: "Deep Hyperspectral Unmixing using Transformer Network", arXiv, 2022 (Jalpaiguri Engineering College, India). [Paper]
    • SiamixFormer: "SiamixFormer: A Siamese Transformer Network For Building Detection And Change Detection From Bi-Temporal Remote Sensing Images", arXiv, 2022 (Tarbiat Modares University, Iran). [Paper]
    • DAHiTrA: "DAHiTrA: Damage Assessment Using a Novel Hierarchical Transformer Architecture", arXiv, 2022 (Simon Fraser University, Canada). [Paper]
    • RVSA: "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model", arXiv, 2022 (Wuhan University + The University of Sydney). [Paper]
    • SatViT: "Transfer Learning with Pretrained Remote Sensing Transformers", arXiv, 2022 (?). [Paper][PyTorch]
    • FTN: "Fully Transformer Network for Change Detection of Remote Sensing Images", arXiv, 2022 (Dalian University of Technology). [Paper]
    • MCTNet: "MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images", arXiv, 2022 (Tsinghua University). [Paper]
    • ?: "Transformers For Recognition In Overhead Imagery: A Reality Check", arXiv, 2022 (Duke University). [Paper]
  • Robotics:
    • TF-Grasp: "When Transformer Meets Robotic Grasping: Exploits Context for Efficient Grasp Detection", arXiv, 2022 (University of Science and Technology of China). [Paper][Code (in construction)]
    • BeT: "Behavior Transformers: Cloning k modes with one stone", arXiv, 2022 (NYU). [Paper][PyTorch]
    • Perceiver-Actor: "Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation", Conference on Robot Learning (CoRL), 2022 (NVIDIA). [Paper][Website]
    • PACT: "PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training", arXiv, 2022 (Microsoft). [Paper]
    • ?: "A Strong Transfer Baseline for RGB-D Fusion in Vision Transformers", arXiv, 2022 (University of Groningen, The Netherlands). [Paper]
    • ?: "Grounding Language with Visual Affordances over Unstructured Data", arXiv, 2022 (University of Freiburg, Germany). [Paper][Website]
    • VIMA: "VIMA: General Robot Manipulation with Multimodal Prompts", arXiv, 2022 (NVIDIA). [Paper][PyTorch][Website]
  • Scene Decomposition:
    • SRT: "Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations", CVPR, 2022 (Google). [Paper][PyTorch (stelzner)][Website]
    • OSRT: "Object Scene Representation Transformer", NeurIPS, 2022 (Google). [Paper][Website]
    • Prompter: "Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following", arXiv, 2022 (Hitachi). [Paper]
  • Scene Text Recognition:
    • ViTSTR: "Vision Transformer for Fast and Efficient Scene Text Recognition", ICDAR, 2021 (University of the Philippines). [Paper]
    • STKM: "Self-attention based Text Knowledge Mining for Text Detection", CVPR, 2021 (?). [Paper][Code (in construction)]
    • I2C2W: "I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition", arXiv, 2021 (NTU Singapoer). [Paper]
    • CornerTransformer: "Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition", ECCV, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch]
    • CUTE: "Contextual Text Block Detection towards Scene Text Understanding", ECCV, 2022 (NTU Singapore). [Paper][Website]
    • PARSeq: "Scene Text Recognition with Permuted Autoregressive Sequence Models", ECCV, 2022 (University of the Philippines). [Paper][PyTorch]
    • PTIE: "Pure Transformer with Integrated Experts for Scene Text Recognition", ECCV, 2022 (NTU Singapore). [Paper]
    • MGP-STR: "Multi-Granularity Prediction for Scene Text Recognition", ECCV, 2022 (Alibaba). [Paper]
    • VLAMD: "Vision-Language Adaptive Mutual Decoder for OOV-STR", ECCVW, 2022 (iFLYTEK, China). [Paper]
    • MVLT: "Masked Vision-Language Transformers for Scene Text Recognition", BMVC, 2022 (Westone Information Industry Inc., China). [Paper][PyTorch]
  • Spike:
    • Spikformer: "Spikformer: When Spiking Neural Network Meets Transformer", arXiv, 2022 (Peking). [Paper]
  • Stereo:
    • STTR: "Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers", ICCV, 2021 (Johns Hopkins). [Paper][PyTorch]
    • PS-Transformer: "PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism", BMVC, 2021 (National Institute of Informatics, JAPAN). [Paper][PyTorch]
    • ChiTransformer: "ChiTransformer: Towards Reliable Stereo from Cues", CVPR, 2022 (GSU). [Paper]
    • TransMVSNet: "TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers", CVPR, 2022 (Megvii). [Paper][Code (in construction)]
    • MVSTER: "MVSTER: Epipolar Transformer for Efficient Multi-View Stereo", ECCV, 2022 (CAS). [Paper][PyTorch]
    • CEST: "Context-Enhanced Stereo Transformer", ECCV, 2022 (CAS). [[Paper](Context-Enhanced Stereo Transformer)][PyTorch]
    • WT-MVSNet: "WT-MVSNet: Window-based Transformers for Multi-view Stereo", NeurIPS, 2022 (Tsinghua University). [Paper]
    • MVSFormer: "MVSFormer: Learning Robust Image Representations via Transformers and Temperature-based Depth for Multi-View Stereo", arXiv, 2022 (Fudan University). [Paper]
  • Time Series:
    • MissFormer: "MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction", arXiv, 2021 (Fraunhofer IOSB, Germany). [Paper]
  • Traffic:
    • NEAT: "NEAT: Neural Attention Fields for End-to-End Autonomous Driving", ICCV, 2021 (MPI). [Paper][PyTorch]
    • ViTAL: "Novelty Detection and Analysis of Traffic Scenario Infrastructures in the Latent Space of a Vision Transformer-Based Triplet Autoencoder", IV, 2021 (Technische Hochschule Ingolstadt). [Paper]
    • ?: "Predicting Vehicles Trajectories in Urban Scenarios with Transformer Networks and Augmented Information", IVS, 2021 (Universidad de Alcala). [Paper]
    • ?: "Translating Images into Maps", ICRA, 2022 (University of Surrey, UK). [Paper][PyTorch (in construction)]
    • Crossview-Transformer: "Cross-view Transformers for real-time Map-view Semantic Segmentation", CVPR, 2022 (UT Austin). [Paper][PyTorch]
    • ViT-BEVSeg: "ViT-BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation", IJCNN, 2022 (Maynooth University, Ireland). [Paper][Code (in construction)]
    • MSF3DDETR: "MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving", ICPRW, 2022 (University of Coimbra, Portugal). [Paper]
    • TransLPC: "Transformers for Object Detection in Large Point Clouds", ITSC, 2022 (Bosch). [Paper]
    • PicT: "PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification", ACMMM, 2022 (Chongqing University). [Paper][PyTorch (in construction)]
    • BEVFormer: "BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers", ECCV, 2022 (Shanghai AI Lab). [Paper][PyTorch]
    • JPerceiver: "JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes", ECCV, 2022 (The University of Sydney). [Paper][PyTorch]
    • V2X-ViT: "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer", ECCV, 2022 (UCLA). [Paper]
    • ?: "Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects?", IROSW, 2022 (Bosch). [Paper]
    • MTR: "Motion Transformer with Global Intention Localization and Local Movement Refinement", NeurIPS, 2022 (MPI). [Paper][Code (in construction)]
    • PlanT: "PlanT: Explainable Planning Transformers via Object-Level Representations", Conference on Robot Learning (CoRL), 2022 (TUM). [Paper][PyTorch][Website]
    • BEVSegFormer: "BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs", arXiv, 2022 (Nullmax, China). [Paper]
    • ParkPredict+: "ParkPredict+: Multimodal Intent and Motion Prediction for Vehicles in Parking Lots with CNN and Transformer", arXiv, 2022 (Berkeley). [Paper]
    • GKT: "Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer", arXiv, 2022 (Huazhong University of Science and Technology). [Paper][Code (in construction)]
    • CoBEVT: "CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers", arXiv, 2022 (UCLA). [Paper]
    • ?: "Pyramid Transformer for Traffic Sign Detection", arXiv, 2022 (Iran University of Science and Technology). [Paper]
    • UniFormer: "UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View", arXiv, 2022 (Zhejiang University). [Paper]
    • STrajNet: "STrajNet: Occupancy Flow Prediction via Multi-modal Swin Transformer", arXiv, 2022 (NTU, Singapore). [Paper]
    • MTPP: "Multi-modal Transformer Path Prediction for Autonomous Vehicle", arXiv, 2022 (National Central University). [Paper]
    • MapTR: "MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction", arXiv, 2022 (Horizon Robotics). [Paper][Code (in construction)]
    • DCT: "A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View", arXiv, 2022 (Gwang-ju Institute of Science and Technology). [Paper]
    • C-ViT: "Traffic Accident Risk Forecasting using Contextual Vision Transformers", arXiv, 2022 (University of Technology Sydney). [Paper]
    • BEVFormer-v2: "BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision", arXiv, 2022 (Tsinghua University). [Paper]
  • Trajectory Prediction:
    • mmTransformer: "Multimodal Motion Prediction with Stacked Transformers", CVPR, 2021 (CUHK + SenseTime). [Paper][Code (in construction)][Website]
    • AgentFormer: "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting", ICCV, 2021 (CMU). [Paper][PyTorch][Website]
    • S2TNet: "S2TNet: Spatio-Temporal Transformer Networks for Trajectory Prediction in Autonomous Driving", ACML, 2021 (Xi'an Jiaotong University). [Paper][PyTorch]
    • MRT: "Multi-Person 3D Motion Prediction with Multi-Range Transformers", NeurIPS, 2021 (UCSD + Berkeley). [Paper][PyTorch][Website]
    • ?: "Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction", ICLR, 2022 (MILA). [Paper]
    • Scene-Transformer: "Scene Transformer: A unified architecture for predicting multiple agent trajectories", ICLR, 2022 (Google). [Paper]
    • ST-MR: "Graph-based Spatial Transformer with Memory Replay for Multi-Future Pedestrian Trajectory Prediction", CVPR, 2022 (University of New South Wales, Australia). [Paper][Tensorflow]
    • HiVT: "HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction", CVPR, 2022 (CUHK). [Paper]
    • EF-Transformer: "Entry-Flipped Transformer for Inference and Prediction of Participant Behavior", ECCV, 2022 (NTU, Singapore). [Paper]
    • Social-SSL: "Social-SSL: Self-Supervised Cross-Sequence Representation Learning Based on Transformers for Multi-Agent Trajectory Prediction", ECCV, 2022 (NYCU). [Paper][PyTorch]
    • LatentFormer: "LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction", arXiv, 2022 (Huawei). [Paper]
    • PreTR: "PreTR: Spatio-Temporal Non-Autoregressive Trajectory Prediction Transformer", arXiv, 2022 (Stellantis, France). [Paper]
    • Wayformer: "Wayformer: Motion Forecasting via Simple & Efficient Attention Networks", arXiv, 2022 (Waymo). [Paper]
    • LaTTe: "LaTTe: Language Trajectory TransformEr", arXiv, 2022 (TUM). [Paper][Tensorflow]
    • SoMoFormer: "SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion Prediction", arXiv, 2022 (Hangzhou Dianzi University). [Paper]
    • ViewBirdiformer: "ViewBirdiformer: Learning to recover ground-plane crowd trajectories and ego-motion from a single ego-centric view", arXiv, 2022 (Kyoto University). [Paper]
    • PedFormer: "PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention Modulation and Gated Multitask Learning", arXiv, 2022 (Huawei). [Paper]
    • TAMFormer: "TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction", arXiv, 2022 (University of Padova, Italy). [Paper]
  • Visual Counting:
    • CC-AV: "Audio-Visual Transformer Based Crowd Counting", ICCVW, 2021 (University of Kansas). [Paper]
    • TransCrowd: "TransCrowd: Weakly-Supervised Crowd Counting with Transformer", arXiv, 2021 (Huazhong University of Science and Technology). [Paper][PyTorch]
    • TAM-RTM: "Boosting Crowd Counting with Transformers", arXiv, 2021 (ETHZ). [Paper]
    • CCTrans: "CCTrans: Simplifying and Improving Crowd Counting with Transformer", arXiv, 2021 (Meituan). [Paper]
    • MAN: "Boosting Crowd Counting via Multifaceted Attention", CVPR, 2022 (Xi'an Jiaotong). [Paper][PyTorch]
    • CLTR: "An End-to-End Transformer Model for Crowd Localization", ECCV, 2022 (Huazhong University of Science and Technology). [Paper][PyTorch][Website]
    • SAANet: "Scene-Adaptive Attention Network for Crowd Counting", arXiv, 2022 (Xi'an Jiaotong). [Paper]
    • JCTNet: "Joint CNN and Transformer Network via weakly supervised Learning for efficient crowd counting", arXiv, 2022 (Chongqing University). [Paper]
    • CrowdMLP: "CrowdMLP: Weakly-Supervised Crowd Counting via Multi-Granularity MLP", arXiv, 2022 (University of Guelph, Canada). [Paper]
    • CounTR: "CounTR: Transformer-based Generalised Visual Counting", arXiv, 2022 (Shanghai Jiao Tong University). [Paper][Website]
  • Visual Quality Assessment:
    • TRIQ: "Transformer for Image Quality Assessment", arXiv, 2020 (NORCE). [Paper][Tensorflow-Keras]
    • IQT: "Perceptual Image Quality Assessment with Transformers", CVPRW, 2021 (LG). [Paper][Code (in construction)]
    • MUSIQ: "MUSIQ: Multi-scale Image Quality Transformer", ICCV, 2021 (Google). [Paper]
    • TranSLA: "Saliency-Guided Transformer Network Combined With Local Embedding for No-Reference Image Quality Assessment", ICCVW, 2021 (Hikvision). [Paper]
    • TReS: "No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency", WACV, 2022 (CMU). [Paper]
    • IQA-Conformer: "Conformer and Blind Noisy Students for Improved Image Quality Assessment", CVPRW, 2022 (University of Wurzburg, Germany). [Paper][PyTorch]
    • SwinIQA: "SwinIQA: Learned Swin Distance for Compressed Image Quality Assessment", CVPRW, 2022 (USTC, China). [Paper]
    • DCVQE: "DCVQE: A Hierarchical Transformer for Video Quality Assessment", ACCV, 2022 (Weibo). [Paper]
    • MCAS-IQA: "Visual Mechanisms Inspired Efficient Transformers for Image and Video Quality Assessment", arXiv, 2022 (Norwegian Research Centre, Norway). [Paper]
    • MSTRIQ: "MSTRIQ: No Reference Image Quality Assessment Based on Swin Transformer with Multi-Stage Fusion", arXiv, 2022 (ByteDance). [Paper]
    • DisCoVQA: "DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment", arXiv, 2022 (NTU, Singapore). [Paper]
  • Visual Reasoning:
    • SAViR-T: "SAViR-T: Spatially Attentive Visual Reasoning with Transformers", arXiv, 2022 (Rutgers University). [Paper]
  • 3D Human Texture Estimation:
    • Texformer: "3D Human Texture Estimation from a Single Image with Transformers", ICCV, 2021 (NTU, Singapore). [Paper][PyTorch][Website]
  • 3D Motion Synthesis:
    • ACTOR: "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV, 2021 (Univ Gustave Eiffel). [Paper][PyTorch][Website]
    • RTVAE: "Recurrent Transformer Variational Autoencoders for Multi-Action Motion Synthesis", CVPRW, 2022 (Amazon). [Paper]
    • MotionCLIP: "MotionCLIP: Exposing Human Motion Generation to CLIP Space", ECCV, 2022 (Tel Aviv). [Paper]
    • CLIP-Actor: "CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes", ECCV, 2022 (POSTECH). [Paper][PyTorch][Website]
    • PoseGPT: "PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting", ECCV, 2022 (NAVER). [Paper]
    • TEMOS: "TEMOS: Generating diverse human motions from textual descriptions", ECCV, 2022 (MPI). [Paper][PyTorch][Website]
    • TM2T: "TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts", ECCV, 2022 (University of Alberta, Canada). [Paper][PyTorch][Website]
    • HUMANISE: "HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes", NeurIPS, 2022 (Beijing Institute of Technology). [Paper][GitHub][Website]
    • ActFormer: "ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation", arXiv, 2022 (SenseTime). [Paper]
    • ?: "Diverse Dance Synthesis via Keyframes with Transformer Controllers", arXiv, 2022 (Beihang University). [Paper]
    • MARIONET: "NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System", arXiv, 2022 (Wuhan University). [Paper]
    • OhMG: "OhMG: Zero-shot Open-vocabulary Human Motion Generation", arXiv, 2022 (Sun Yat-Sen University). [Paper]
    • Action-GPT: "Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation", arXiv, 2022 (IIIT Hyderabad). [Paper][Website]
    • Optimus: "Transformer-Based Learned Optimization", arXiv, 2022 (Google). [Paper]
  • 3D Object Recognition:
    • MVT: "MVT: Multi-view Vision Transformer for 3D Object Recognition", BMVC, 2021 (Baidu). [Paper]
  • 3D Reconstruction:
    • PlaneTR: "PlaneTR: Structure-Guided Transformers for 3D Plane Recovery", ICCV, 2021 (Wuhan University). [Paper][PyTorch]
    • CO3D: "CommonObjects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction", ICCV, 2021 (Facebook). [Paper][PyTorch]
    • VolT: "Multi-view 3D Reconstruction with Transformer", ICCV, 2021 (University of British Columbia). [Paper]
    • 3D-RETR: "3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers", BMVC, 2021 (ETHZ). [Paper][PyTorch]
    • TransformerFusion: "TransformerFusion: Monocular RGB Scene Reconstruction using Transformers", NeurIPS, 2021 (TUM). [Paper][Website]
    • LegoFormer: "LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction", arXiv, 2021 (TUM + Google). [Paper]
    • PlaneFormers: "PlaneFormers: From Sparse View Planes to 3D Reconstruction", ECCV, 2022 (UMich). [Paper][PyTorch][Website]
    • 3D-C2FT: "3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction", arXiv, 2022 (Korea Institute of Science and Technology). [Paper]
  • 3D Scene:
    • OpenScene: "OpenScene: 3D Scene Understanding with Open Vocabularies", arXiv, 2022 (Google). [Paper][Website]
    • ?: "Language-driven Open-Vocabulary 3D Scene Understanding", arXiv, 2022 (ByteDance). [Paper]
  • 360 Scene:
    • ?: "Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-supervised Learning", AAAI, 2022 (Seoul National University). [Paper][PyTorch]
    • PAVER: "Panoramic Vision Transformer for Saliency Detection in 360° Videos", ECCV, 2022 (Seoul National University). [Paper]
    • PanoFormer: "PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation", ECCV, 2022 (Beijing Jiaotong University). [Paper]
    • CoVisPose: "CoVisPose: Co-Visibility Pose Transformer for Wide-Baseline Relative Pose Estimation in 360° Indoor Panoramas", ECCV, 2022 (Zillow). [Paper]
    • SPH: "Spherical Transformer", arXiv, 2022 (Chung-Ang University, Korea). [Paper]
  • Others:
    • ?: "Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search", ECCV, 2022 (Intellifusion, China). [Paper]
    • ?: "Strong Gravitational Lensing Parameter Estimation with Vision Transformer", ECCVW, 2022 (CMU). [Paper][PyTorch]
    • Transformer-DR: "Transformer-based dimensionality reduction", arXiv, 2022 (Chongqing Normal University, China). [Paper]
    • ?: "mm-Wave Radar Hand Shape Classification Using Deformable Transformers", arXiv, 2022 (Intel). [Paper]
    • ?: "Fully-attentive and interpretable: vision and video vision transformers for pain detection", NeurIPSW, 2022 (Utrecht University, Netherlands). [Paper][Code (in construction)]

[Back to Overview]


Attention Mechanisms in Vision/NLP

Attention for Vision

  • AA: "Attention Augmented Convolutional Networks", ICCV, 2019 (Google). [Paper][PyTorch (Unofficial)][Tensorflow (Unofficial)]
  • LR-Net: "Local Relation Networks for Image Recognition", ICCV, 2019 (Microsoft). [Paper][PyTorch (Unofficial)]
  • CCNet: "CCNet: Criss-Cross Attention for Semantic Segmentation", ICCV, 2019 (& TPAMI 2020) (Horizon). [Paper][PyTorch]
  • GCNet: "Global Context Networks", ICCVW, 2019 (& TPAMI 2020) (Microsoft). [Paper][PyTorch]
  • SASA: "Stand-Alone Self-Attention in Vision Models", NeurIPS, 2019 (Google). [Paper][PyTorch-1 (Unofficial)][PyTorch-2 (Unofficial)]
    • key message: attention module is more efficient than conv & provide comparable accuracy
  • Axial-Transformer: "Axial Attention in Multidimensional Transformers", arXiv, 2019 (Google). [Paper][PyTorch (Unofficial)]
  • Attention-CNN: "On the Relationship between Self-Attention and Convolutional Layers", ICLR, 2020 (EPFL). [Paper][PyTorch][Website]
  • SAN: "Exploring Self-attention for Image Recognition", CVPR, 2020 (CUHK + Intel). [Paper][PyTorch]
  • BA-Transform: "Non-Local Neural Networks With Grouped Bilinear Attentional Transforms", CVPR, 2020 (ByteDance). [Paper]
  • Axial-DeepLab: "Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation", ECCV, 2020 (Google). [Paper][PyTorch]
  • GSA: "Global Self-Attention Networks for Image Recognition", arXiv, 2020 (Google). [Paper][PyTorch (Unofficial)]
  • EA: "Efficient Attention: Attention with Linear Complexities", WACV, 2021 (SenseTime). [Paper][PyTorch]
  • LambdaNetworks: "LambdaNetworks: Modeling long-range Interactions without Attention", ICLR, 2021 (Google). [Paper][PyTorch-1 (Unofficial)][PyTorch-2 (Unofficial)]
  • GSA-Nets: "Group Equivariant Stand-Alone Self-Attention For Vision", ICLR, 2021 (EPFL). [Paper]
  • Hamburger: "Is Attention Better Than Matrix Decomposition?", ICLR, 2021 (Peking). [Paper][PyTorch (Unofficial)]
  • HaloNet: "Scaling Local Self-Attention For Parameter Efficient Visual Backbones", CVPR, 2021 (Google). [Paper]
  • BoTNet: "Bottleneck Transformers for Visual Recognition", CVPR, 2021 (Google). [Paper]
  • SSAN: "SSAN: Separable Self-Attention Network for Video Representation Learning", CVPR, 2021 (Microsoft). [Paper]
  • CoTNet: "Contextual Transformer Networks for Visual Recognition", CVPRW, 2021 (JD). [Paper][PyTorch]
  • Involution: "Involution: Inverting the Inherence of Convolution for Visual Recognition", CVPR, 2021 (HKUST). [Paper][PyTorch]
  • Perceiver: "Perceiver: General Perception with Iterative Attention", ICML, 2021 (DeepMind). [Paper][PyTorch (lucidrains)]
  • SNL: "Unifying Nonlocal Blocks for Neural Networks", ICCV, 2021 (Peking + Bytedance). [Paper]
  • External-Attention: "Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks", arXiv, 2021 (Tsinghua). [Paper]
  • Container: "Container: Context Aggregation Network", arXiv, 2021 (AI2). [Paper]
  • X-volution: "X-volution: On the unification of convolution and self-attention", arXiv, 2021 (Huawei Hisilicon). [Paper]
  • Invertible-Attention: "Invertible Attention", arXiv, 2021 (ANU). [Paper]
  • VOLO: "VOLO: Vision Outlooker for Visual Recognition", arXiv, 2021 (Sea AI Lab + NUS, Singapore). [Paper][PyTorch]
  • LESA: "Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms", arXiv, 2021 (Johns Hopkins). [Paper]
  • PS-Attention: "Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention", AAAI, 2022 (Baidu). [Paper][Paddle]
  • QuadTree: "QuadTree Attention for Vision Transformers", ICLR, 2022 (Simon Fraser + Alibaba). [Paper][PyTorch]
  • QnA: "Learned Queries for Efficient Local Attention", CVPR, 2022 (Tel-Aviv). [Paper][Jax]
  • ?: "Fair Comparison between Efficient Attentions", CVPRW, 2022 (Kyungpook National University, Korea). [Paper][PyTorch]
  • KVT: "KVT: k-NN Attention for Boosting Vision Transformers", ECCV, 2022 (Alibaba). [Paper][PyTorch]
  • Hydra: "Hydra Attention: Efficient Attention with Many Heads", ECCVW, 2022 (Meta). [Paper]
  • HiP: "Hierarchical Perceiver", arXiv, 2022 (DeepMind). [Paper]
  • AttendNeXt: "Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers", arXiv, 2022 (University of Waterloo, Canada). [Paper]

[Back to Overview]

Attention for NLP

  • T-DMCA: "Generating Wikipedia by Summarizing Long Sequences", ICLR, 2018 (Google). [Paper]
  • LSRA: "Lite Transformer with Long-Short Range Attention", ICLR, 2020 (MIT). [Paper][PyTorch]
  • ETC: "ETC: Encoding Long and Structured Inputs in Transformers", EMNLP, 2020 (Google). [Paper][Tensorflow]
  • BlockBERT: "Blockwise Self-Attention for Long Document Understanding", EMNLP Findings, 2020 (Facebook). [Paper][GitHub]
  • Clustered-Attention: "Fast Transformers with Clustered Attention", NeurIPS, 2020 (Idiap). [Paper][PyTorch][Website]
  • BigBird: "Big Bird: Transformers for Longer Sequences", NeurIPS, 2020 (Google). [Paper][Tensorflow]
  • Longformer: "Longformer: The Long-Document Transformer", arXiv, 2020 (AI2). [Paper][PyTorch]
  • Linformer: "Linformer: Self-Attention with Linear Complexity", arXiv, 2020 (Facebook). [Paper][PyTorch (Unofficial)]
  • Nystromformer: "Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention", AAAI, 2021 (UW-Madison). [Paper][PyTorch]
  • RFA: "Random Feature Attention", ICLR, 2021 (DeepMind). [Paper]
  • Performer: "Rethinking Attention with Performers", ICLR, 2021 (Google). [Paper][Code][Blog]
  • DeLight: "DeLighT: Deep and Light-weight Transformer", ICLR, 2021 (UW). [Paper]
  • Synthesizer: "Synthesizer: Rethinking Self-Attention for Transformer Models", ICML, 2021 (Google). [Paper][Tensorflow][PyTorch (leaderj1001)]
  • Poolingformer: "Poolingformer: Long Document Modeling with Pooling Attention", ICML, 2021 (Microsoft). [Paper]
  • Hi-Transformer: "Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling", ACL, 2021 (Tsinghua). [Paper]
  • Smart-Bird: "Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer", arXiv, 2021 (Tsinghua). [Paper]
  • Fastformer: "Fastformer: Additive Attention is All You Need", arXiv, 2021 (Tsinghua). [Paper]
  • ∞-former: "∞-former: Infinite Memory Transformer", arXiv, 2021 (Instituto de Telecomunicações, Portugal). [Paper]
  • cosFormer: "cosFormer: Rethinking Softmax In Attention", ICLR, 2022 (SenseTime). [Paper][PyTorch (davidsvy)]
  • MGK: "Improving Transformers with Probabilistic Attention Keys", ICML, 2022 (UCLA). [Paper]

[Back to Overview]

Attention for Both

  • Sparse-Transformer: "Generating Long Sequences with Sparse Transformers", arXiv, 2019 (OpenAI). [Paper][Tensorflow][Blog]
  • Reformer: "Reformer: The Efficient Transformer", ICLR, 2020 (Google). [Paper][Tensorflow][Blog]
  • Sinkhorn-Transformer: "Sparse Sinkhorn Attention", ICML, 2020 (Google). [Paper][PyTorch (Unofficial)]
  • Linear-Transformer: "Transformers are rnns: Fast autoregressive transformers with linear attention", ICML, 2020 (Idiap). [Paper][PyTorch][Website]
  • SMYRF: "SMYRF: Efficient Attention using Asymmetric Clustering", NeurIPS, 2020 (UT Austin + Google). [Paper][PyTorch]
  • Routing-Transformer: "Efficient Content-Based Sparse Attention with Routing Transformers", TACL, 2021 (Google). [Paper][Tensorflow][PyTorch (Unofficial)][Slides]
  • LRA: "Long Range Arena: A Benchmark for Efficient Transformers", ICLR, 2021 (Google). [Paper][Tensorflow]
  • OmniNet: "OmniNet: Omnidirectional Representations from Transformers", ICML, 2021 (Google). [Paper]
  • Evolving-Attention: "Evolving Attention with Residual Convolutions", ICML, 2021 (Peking + Microsoft). [Paper]
  • H-Transformer-1D: "H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences", ACL, 2021 (Google). [Paper]
  • Combiner: "Combiner: Full Attention Transformer with Sparse Computation Cost", NeurIPS, 2021 (Google). [Paper]
  • Centroid-Transformer: "Centroid Transformers: Learning to Abstract with Attention", arXiv, 2021 (UT Austin). [Paper]
  • AFT: "An Attention Free Transformer", arXiv, 2021 (Apple). [Paper]
  • Luna: "Luna: Linear Unified Nested Attention", arXiv, 2021 (USC + CMU + Facebook). [Paper]
  • Transformer-LS: "Long-Short Transformer: Efficient Transformers for Language and Vision", arXiv, 2021 (NVIDIA). [Paper]
  • PoNet: "PoNet: Pooling Network for Efficient Token Mixing in Long Sequences", ICLR, 2022 (Alibaba). [Paper]
  • Paramixer: "Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention", CVPR, 2022 (Norwegian University of Science and Technology, Norway). [Paper]
  • ContextPool: "Efficient Representation Learning via Adaptive Context Pooling", ICML, 2022 (Apple). [Paper]
  • LARA: "Linear Complexity Randomized Self-attention Mechanism", ICML, 2022 (Bytedance). [Paper]
  • Flowformer: "Flowformer: Linearizing Transformers with Conservation Flows", ICML, 2022 (Tsinghua University). [Paper][PyTorch]
  • MRA: "Multi Resolution Analysis (MRA) for Approximate Self-Attention", ICML, 2022 (University of Wisconsin, Madison). [Paper][PyTorch]
  • EcoFormer: "EcoFormer: Energy-Saving Attention with Linear Complexity", NeurIPS, 2022 (Monash University). [Paper][PyTorch]
  • SBM-Transformer: "Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost", NeurIPS, 2022 (LG). [Paper][PyTorch]
  • ?: "Horizontal and Vertical Attention in Transformers", arXiv, 2022 (University of Technology Sydney). [Paper]
  • MRL: "MRL: Learning to Mix with Attention and Convolutions", arXiv, 2022 (Sony). [Paper]

[Back to Overview]

Attention for Others

  • Informer: "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting", AAAI, 2021 (Beihang University). [Paper][PyTorch]
  • Attention-Rank-Collapse: "Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth", ICML, 2021 (Google + EPFL). [Paper][PyTorch]
  • ?: "Choose a Transformer: Fourier or Galerkin", NeurIPS, 2021 (Washington University, St. Louis). [Paper]
  • NPT: "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning", arXiv, 2021 (Oxford). [Paper]
  • FEDformer: "FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting", ICML, 2022 (Alibaba). [Paper][PyTorch]
  • ?: "Generalizable Memory-driven Transformer for Multivariate Long Sequence Time-series Forecasting", arXiv, 2022 (University of Technology Sydney). [Paper]

[Back to Overview]