Skip to content

chenchen333-dev/CVPR2021-Papers-with-Code

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

CVPR 2021 论文和开源项目合集(Papers with Code)

CVPR 2021 论文和开源项目合集(papers with code)!

CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt

注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

CVPR 2021 中奖群已成立!已经收录的同学,可以添加微信:CVer9999,请备注:CVPR2021已收录+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群沟通开会等事宜。

【CVPR 2021 论文开源目录】

Backbone

BCNet: Searching for Network Width with Bilaterally Coupled Network

Decoupled Dynamic Filter Networks

Lite-HRNet: A Lightweight High-Resolution Network

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network

Involution: Inverting the Inherence of Convolution for Visual Recognition

Coordinate Attention for Efficient Mobile Network Design

Inception Convolution with Efficient Dilation Search

RepVGG: Making VGG-style ConvNets Great Again

NAS

BCNet: Searching for Network Width with Bilaterally Coupled Network

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

  • Paper: ttps://arxiv.org/abs/2105.10154
  • Code: None

Combined Depth Space based Architecture Search For Person Re-identification

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Neural Architecture Search with Random Labels

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

Prioritized Architecture Sampling with Monto-Carlo Tree Search

Contrastive Neural Architecture Search with Neural Architecture Comparators

AttentiveNAS: Improving Neural Architecture Search via Attentive

ReNAS: Relativistic Evaluation of Neural Architecture Search

HourNAS: Extremely Fast Neural Architecture

Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

Inception Convolution with Efficient Dilation Search

GAN

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

Regularizing Generative Adversarial Networks under Limited Data

Towards Real-World Blind Face Restoration with Generative Facial Prior

TediGAN: Text-Guided Diverse Image Generation and Manipulation

Generative Hierarchical Features from Synthesizing Image

Teachers Do More Than Teach: Compressing Image-to-Image Models

HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

Diverse Semantic Image Synthesis via Probability Distribution Modeling

LOHO: Latent Optimization of Hairstyles via Orthogonalization

PISE: Person Image Synthesis and Editing with Decoupled GAN

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Efficient Conditional GAN Transfer with Knowledge Propagation across Classes

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None
  • Code: None

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

A 3D GAN for Improved Large-pose Facial Recognition

HumanGAN: A Generative Model of Humans Images

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis

CoMoGAN: continuous model-guided image-to-image translation

Training Generative Adversarial Networks in One Stage

Closed-Form Factorization of Latent Semantics in GANs

Anycost GANs for Interactive Image Synthesis and Editing

Image-to-image Translation via Hierarchical Style Disentanglement

VAE

Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders

Visual Transformer

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

6. Pose Recognition with Cascade Transformers

7. Variational Transformer Networks for Layout Generation

8. LoFTR: Detector-Free Local Feature Matching with Transformers

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

11. Transformer Tracking

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

13. MIST: Multiple Instance Spatial Transformer

14. Multimodal Motion Prediction with Stacked Transformers

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

17. Pre-Trained Image Processing Transformer

18. End-to-End Video Instance Segmentation with Transformers

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

20. End-to-End Human Object Interaction Detection with HOI Transformer

21. Transformer Interpretability Beyond Attention Visualization

22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

  • Paper: None
  • Code: None

23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

  • Paper: None
  • Code: None

24. Line Segment Detection Using Transformers without Edges

25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
  • Code: None

26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

27. Facial Action Unit Detection With Transformers

  • Paper: None
  • Code: None

28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

  • Paper: None
  • Code: None

29. Lesion-Aware Transformers for Diabetic Retinopathy Grading

  • Paper: None
  • Code: None

30. Topological Planning With Transformers for Vision-and-Language Navigation

31. Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None
  • Code: None

32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

  • Paper: None
  • Code: None

33. Taming Transformers for High-Resolution Image Synthesis

34. Self-Supervised Video Hashing via Bidirectional Transformers

  • Paper: None
  • Code: None

35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

36. Gaussian Context Transformer

  • Paper: None
  • Code: None

37. General Multi-Label Image Classification With Transformers

38. Bottleneck Transformers for Visual Recognition

39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

41. Self-attention based Text Knowledge Mining for Text Detection

42. SSAN: Separable Self-Attention Network for Video Representation Learning

  • Paper: None
  • Code: None

43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Regularization

Regularizing Neural Networks via Adversarial Model Perturbation

SLAM

Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation

Generalizing to the Open World: Deep Visual Odometry with Online Adaptation

长尾分布(Long-Tailed)

Adversarial Robustness under Long-Tailed Distribution

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

Adaptive Class Suppression Loss for Long-Tail Object Detection

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

数据增广(Data Augmentation)

Scale-aware Automatic Augmentation for Object Detection

无监督/自监督(Un/Self-Supervised)

Domain-Specific Suppression for Adaptive Object Detection

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Self-supervised Video Representation Learning by Context and Motion Decoupling

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Spatially Consistent Representation Learning

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

Exploring Simple Siamese Representation Learning

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

半监督学习(Semi-Supervised )

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

胶囊网络(Capsule Network)

Capsule Network is Not More Robust than Convolutional Network

图像分类(Image Classification)

Correlated Input-Dependent Label Noise in Large-Scale Image Classification

2D目标检测(Object Detection)

2D目标检测

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

Domain-Specific Suppression for Adaptive Object Detection

IQDet: Instance-wise Quality Distribution Sampling for Object Detection

Multi-Scale Aligned Distillation for Low-Resolution Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection

VarifocalNet: An IoU-aware Dense Object Detector

Scale-aware Automatic Augmentation for Object Detection

OTA: Optimal Transport Assignment for Object Detection

Distilling Object Detectors via Decoupled Features

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Positive-Unlabeled Data Purification in the Wild for Object Detection

  • Paper: None
  • Code: None

Instance Localization for Self-supervised Detection Pretraining

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

End-to-End Object Detection with Fully Convolutional Network

Robust and Accurate Object Detection via Adversarial Learning

I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

YOLOF:You Only Look One-level Feature

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

General Instance Distillation for Object Detection

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Multiple Instance Active Learning for Object Detection

Towards Open World Object Detection

Few-Shot目标检测

Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None
  • Code: None

Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

Few-Shot Object Detection via Contrastive Proposal Encoding

旋转目标检测

ReDet: A Rotation-equivariant Detector for Aerial Object Detection

单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Graph Attention Tracking

Rotation Equivariant Siamese Networks for Tracking

Track to Detect and Segment: An Online Multi-Object Tracker

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Transformer Tracking

多目标跟踪

Multiple Object Tracking with Correlation Learning

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Learning a Proposal Classifier for Multiple Object Tracking

Track to Detect and Segment: An Online Multi-Object Tracker

语义分割(Semantic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Rethinking BiSeNet For Real-time Semantic Segmentation

Progressive Semantic Segmentation

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Bidirectional Projection Network for Cross Dimension Scene Understanding

Cross-Dataset Collaborative Learning for Semantic Segmentation

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

Capturing Omni-Range Context for Omnidirectional Segmentation

Learning Statistical Texture for Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation

弱监督语义分割

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

半监督语义分割

Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

域自适应语义分割

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

视频语义分割

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Incremental Few-Shot Instance Segmentation

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

Multi-Scale Aligned Distillation for Low-Resolution Detection

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Zero-shot instance segmentation(Not Sure)

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

End-to-End Video Instance Segmentation with Transformers

全景分割(Panoptic Segmentation)

Exemplar-Based Open-Set Panoptic Segmentation Network

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
  • Code: None

Panoptic Segmentation Forecasting

Fully Convolutional Networks for Panoptic Segmentation

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

医学图像分割

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

3D医学图像分割

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

视频目标分割(Video-Object-Segmentation)

Learning Position and Target Consistency for Memory-based Video Object Segmentation

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

交互式视频目标分割(Interactive-Video-Object-Segmentation)

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

显著性检测(Saliency Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion

伪装物体检测(Camouflaged Object Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

协同显著性检测(Co-Salient Object Detection)

Group Collaborative Learning for Co-Salient Object Detection

协同显著性检测(Image Matting)

Semantic Image Matting

行人重识别(Person Re-identification)

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Combined Depth Space based Architecture Search For Person Re-identification

行人搜索(Person Search)

Anchor-Free Person Search

视频理解/行为识别(Video Understanding)

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

FrameExit: Conditional Early Exiting for Efficient Video Recognition

No frame left behind: Full Video Action Recognition

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

ACTION-Net: Multipath Excitation for Action Recognition

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

TDN: Temporal Difference Networks for Efficient Action Recognition

人脸识别(Face Recognition)

A 3D GAN for Improved Large-pose Facial Recognition

MagFace: A Universal Representation for Face Recognition and Quality Assessment

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

人脸检测(Face Detection)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection

CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

人脸活体检测(Face Anti-Spoofing)

Cross Modal Focal Loss for RGBD Face Anti-Spoofing

Deepfake检测(Deepfake Detection)

Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain

Multi-attentional Deepfake Detection

人脸年龄估计(Age Estimation)

Continuous Face Aging via Self-estimated Residual Age Embedding

PML: Progressive Margin Loss for Long-tailed Age Classification

人脸表情识别(Facial Expression Recognition)

Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition

Deepfakes

MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes

人体解析(Human Parsing)

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

  • Paper: ttps://arxiv.org/abs/2105.10154
  • Code: None

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

Pose Recognition with Cascade Transformers

DCPose: Deep Dual Consecutive Network for Human Pose Estimation

3D 人体姿态估计

End-to-End Human Pose and Mesh Reconstruction with Transformers

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation

动物姿态估计(Animal Pose Estimation)

From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation

Human Volumetric Capture

POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture

场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

图像压缩

Checkerboard Context Model for Efficient Learned Image Compression

Slimmable Compressive Autoencoders for Practical Neural Image Compression

Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton

模型压缩/剪枝/量化

Teachers Do More Than Teach: Compressing Image-to-Image Models

模型剪枝

Dynamic Slimmable Network

模型量化

Network Quantization with Element-wise Gradient Scaling

Zero-shot Adversarial Quantization

Learnable Companding Quantization for Accurate Low-bit Neural Networks

知识蒸馏(Knowledge Distillation)

Distilling Knowledge via Knowledge Review

Distilling Object Detectors via Decoupled Features

超分辨率(Super-Resolution)

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

AdderSR: Towards Energy Efficient Image Super-Resolution

去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

High-Fidelity and Arbitrary Face Editing

Anycost GANs for Interactive Image Synthesis and Editing

PISE: Person Image Synthesis and Editing with Decoupled GAN

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None
  • Code: None

图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

字体生成(Font Generation)

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

Convolutional Hough Matching Networks

图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

反光去除(Reflection Removal)

Robust Reflection Removal with Reflection-free Flash-only Cues

3D点云分类(3D Point Clouds Classification)

Equivariant Point Network for 3D Point Cloud Analysis

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

3D目标检测(3D Object Detection)

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector

SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud

Center-based 3D Object Detection and Tracking

Categorical Depth Distribution Network for Monocular 3D Object Detection

3D语义分割(3D Semantic Segmentation)

Bidirectional Projection Network for Cross Dimension Scene Understanding

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

3D全景分割(3D Panoptic Segmentation)

Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation

3D目标跟踪(3D Object Trancking)

Center-based 3D Object Detection and Tracking

3D点云配准(3D Point Cloud Registration)

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

PREDATOR: Registration of 3D Point Clouds with Low Overlap

3D点云补全(3D Point Cloud Completion)

Unsupervised 3D Shape Completion through GAN Inversion

Variational Relational Point Completion Network

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion

3D重建(3D Reconstruction)

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

6D位姿估计(6D Pose Estimation)

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

相机姿态估计

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

深度估计(Depth Estimation)

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Beyond Image to Depth: Improving Depth Prediction using Echoes

S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation

Depth from Camera Motion and Object Detection

立体匹配(Stereo Matching)

A Decomposition Model for Stereo Matching

光流估计(Flow Estimation)

Self-Supervised Multi-Frame Monocular Scene Flow

RAFT-3D: Scene Flow using Rigid-Motion Embeddings

Learning Optical Flow From Still Images

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

车道线检测(Lane Detection)

Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

轨迹预测(Trajectory Prediction)

Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction

人群计数(Crowd Counting)

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

对抗样本(Adversarial Examples)

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Natural Adversarial Examples

图像检索(Image Retrieval)

StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

视频检索(Video Retrieval)

On Semantic Similarity in Video Retrieval

跨模态检索(Cross-modal Retrieval)

Cross-Modal Center Loss for 3D Cross-Modal Retrieval

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

Zero-Shot Learning

Counterfactual Zero-Shot and Open-Set Visual Recognition

联邦学习(Federated Learning)

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

视频插帧(Video Frame Interpolation)

CDFI: Compression-Driven Network Design for Frame Interpolation

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

视觉推理(Visual Reasoning)

Transformation Driven Visual Reasoning

图像合成(Image Synthesis)

Taming Transformers for High-Resolution Image Synthesis

视图合成(View Synthesis)

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Self-Supervised Visibility Learning for Novel View Synthesis

NeX: Real-time View Synthesis with Neural Basis Expansion

风格迁移(Style Transfer)

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

布局生成(Layout Generation)

LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

  • Paper: None
  • Code: None

Variational Transformer Networks for Layout Generation

Domain Generalization

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Adaptive Methods for Real-World Domain Generalization

FSDR: Frequency Space Domain Randomization for Domain Generalization

Domain Adaptation

Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation

Domain Consensus Clustering for Universal Domain Adaptation

Open-Set

Towards Open World Object Detection

Exemplar-Based Open-Set Panoptic Segmentation Network

Learning Placeholders for Open-Set Recognition

Adversarial Attack

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

"人-物"交互(HOI)检测

HOTR: End-to-End Human-Object Interaction Detection with Transformers

Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

Reformulating HOI Detection as Adaptive Set Prediction

Detecting Human-Object Interaction via Fabricated Compositional Learning

End-to-End Human Object Interaction Detection with HOI Transformer

阴影去除(Shadow Removal)

Auto-Exposure Fusion for Single-Image Shadow Removal

虚拟换衣(Virtual Try-On)

Parser-Free Virtual Try-on via Distilling Appearance Flows

基于外观流蒸馏的无需人体解析的虚拟换装

标签噪声(Label Noise)

A Second-Order Approach to Learning with Instance-Dependent Label Noise

数据集(Datasets)

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

论文下载链接:

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Learning To Count Everything

Semantic Image Matting

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Visual Semantic Role Labeling for Video Understanding

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

Depth from Camera Motion and Object Detection

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

其他(Others)

Omnimatte: Associating Objects and Their Effects in Video

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Motion Representations for Articulated Animation

Deep Lucas-Kanade Homography for Multimodal Image Alignment

Skip-Convolutions for Efficient Video Processing

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

Learning To Count Everything

SOLD2: Self-supervised Occlusion-aware Line Description and Detection

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

LEAP: Learning Articulated Occupancy of People

Visual Semantic Role Labeling for Video Understanding

UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

Towards High Fidelity Face Relighting with Realistic Shadows

BRepNet: A topological message passing system for solid models

Visually Informed Binaural Audio Generation without Binaural Audios

Exploring intermediate representation for monocular vehicle pose estimation

Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB

Invertible Image Signal Processing

Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences

Embedding Transfer with Label Relaxation for Improved Metric Learning

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

Meta-Mining Discriminative Samples for Kinship Verification

Cloud2Curve: Generation and Vectorization of Parametric Sketches

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

ACRE: Abstract Causal REasoning Beyond Covariation

Confluent Vessel Trees with Accurate Bifurcations

Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

Knowledge Evolution in Neural Networks

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

SGP: Self-supervised Geometric Perception

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

Diffusion Probabilistic Models for 3D Point Cloud Generation

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

待添加(TODO)

不确定中没中(Not Sure)

CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models

Toward Explainable Reflection Removal with Distilling and Model Uncertainty

DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation

Exploring Adversarial Fake Images on Face Manifold

Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task

Temporal Contrastive Graph for Self-supervised Video Representation Learning

Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching

Fast and Memory-Efficient Compact Bilinear Pooling

Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine

Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation

https://github.com/ShaoQiangShen/CVPR2021

https://github.com/gillesflash/CVPR2021

https://github.com/anonymous-submission1991/BaLeNAS

https://github.com/cvpr2021dcb/cvpr2021dcb

https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578

https://github.com/AldrichZeng/FreqPrune

https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM

https://github.com/ddfss/datadrive-fss

About

CVPR 2021 论文和开源项目合集

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published