Skip to content

Yunpeng-Qi/Papers-of-Visual-Signal-Coding-for-Machine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 

Repository files navigation

Papers of Visual Signal Coding for Machine

Purpose: We aim to provide a summary of visual signal coding for machine. More papers will be summarized.

University of Science and Technology of China (USTC), Intelligent Media Computing Lab

📌 About new works. If you want to incorporate your studies (e.g., the link of paper or project) on visual signal coding for machine in this repository. Welcome to raise an issue or email us. We will incorporate it into this repository as soon as possible.

Papers for Visual Coding for Machine

Table of contents

Survey-&-Theory

Models Paper First Author Note Venue Data Project
A Rate-Distortion-Classification Approach for Lossy Image Compression Yuefeng Zhang PrePrint'24
Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics Wenhan Yang, Haofeng Huang and Yueyu Hu survey TPAMI2024 Video
Rate-Distortion Theory in Coding for Machines and its Applications Alon Harell PrePrint'23
Rate-Distortion in Image Coding for Machines Alon Harell PCS2022
Lossy Compression for Lossless Prediction Yann Dubois NeurIPS 2021 Spotlight Code
Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics Lingyu Duan survey TIP2020 Video
On The Classification-Distortion-Perception Tradeoff Dong Liu NeurIPS2019

Compress-Then-Analysis

Models Paper First Author Note Venue Data Project
DT-JRD DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines Junqi Liu TMM Video
FSIC FSIC: Frequency-separated image compression for small object detection Chengjie Dai Frequency decomposition Digital Signal Processing2025 Image
SA-ICM Image Coding for Machines with Edge Information Learning Using Segment Anything Takahiro Shindo ICIP2024 Image
Remote Sensing Image Coding for Machines on Semantic Segmentation via Contrastive Learning Junxi Zhang Remote Sensing TGRS2024 Image
Delta-ICM Delta-ICM: Entropy Modeling with Delta Function for Learned Image Compression Takahiro Shindo PrePrint'24 Image
Free-VSC Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression Yuan Tian unsupervised PrePrint'24 Video
Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs Jinming Liu using MLLM PrePrint'24 Image
High Efficiency Image Compression for Large Visual-Language Models Binzhe Li for MLLM PrePrint'24 Image
Feature-Preserving Rate-Distortion Optimization in Image Coding for Machines Samuel Fernández Menduiña Based on AVC PrePrint'24 Image
Saliency Map-Guided End-to-End Image Coding for Machines Bo Peng IEEE Signal Processing Letters 2024 Image
Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines Honglei Zhang PrePrint'24 Video
A Coding Framework and Benchmark towards Low-Bitrate Video Understanding Yuan Tian TPAMI2024 Video
Task-Aware Encoder Control for Deep Video Compression Xingtong Ge Encoder Control CVPR2024 Video
Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization Jixiang Luo benefit for ICM PrePrint'24 Image
SegPIC Region-Adaptive Transform with Segmentation Prior for Image Compression Yuxi Liu benefit for ICM PrePrint'24 Image Code
VNVC VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision Xihua Sheng TPAMI2024 Video
Task-Switchable Pre-Processor for Image Compression for Multiple Machine Vision Tasks Mingyi Yang multi-task TCSVT2024 Image
Unified Architecture Adaptation for Compressed Domain Semantic Inference Zhihao Duan TCSVT2023 Image
SMC Nonsemantics suppressed mask learning for unsupervised video semantic compression. Yuan Tian ICCV2023 Video
TransTIC TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception Yi-Hsin Chen prompt ICCV2023 Image Code
MCM You Can Mask More For Extremely Low-Bitrate Image Compression Anqi Li benefit for ICM PrePrint'23 Image Code
SMachine Perception-Driven Image Compression: A Layered Generative Approach Yuefeng Zhang PrePrint'23 Image
DMIC Diagnosis-oriented Medical Image Compression with Efficient Transfer Learning Guangqi Xie, Xin Li RL VCIP2023 best paper Medical Data
Towards Efficient Learned Image Coding for Machines via Saliency-Driven Rate Allocation Zixiang Zhang VCIP2023 Image
Composable Image Coding for Machine via Task-oriented Internal Adaptor and External Prior Jinming Liu adapter VCIP2023 Image
Image Coding for Machines based on Non-Uniform Importance Allocation Yunpeng Qi VCIP2023 Image
Saliency-Driven Hierarchical Learned Image Coding for Machines Kristian Fischer ICASSP2023 Image
Neural-Syntax Neural Data-Dependent Transform for Learned Image Compression Dezhao Wang benefit for ICM CVPR2022 Image Code Project
HRLVSC Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation Guangqi Xie RL VCIP2022 Video
Boosting Neural Image Compression for Machines Using Latent Space Masking Kristian Fischer TCSVT2022 Image Code
Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling Qi Zhang PrePrint'22 Video Code
Preprocessing Enhanced Image Compression for Machine Vision Guo Lu PrePrint'22 Image
QmapCompression Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform Myungseo Song benefit for ICM ICCV2021 Image Code
A Novel Video Coding Strategy in HEVC for Object Detection Qi Cai TCSVT2021 Video
Task-Driven Semantic Coding via Reinforcement Learning Xin Li RL TIP2021 Image
Learned Image Coding for Machines: A Content-Adaptive Approach Nam Le ICME2021 Image
Visual Analysis Motivated Rate-Distortion Model for Image Coding Zhimeng Huang ICME2021 Image
Image coding for machines: an end-to-end learned approach Nam Le ICASSP2021 Image
High Efficiency Compression for Object Detection Hyomin Choi ICASSP2018 Image
Faster Neural Networks Straight from JPEG Lionel Gueguen NeurIPS2018 Image Code
Towards Image Understanding from Deep Compression Without Decoding Robert Torfason ICLR2018 Image

Feature-Compression

Model Paper First Author Note Venue Data Project
Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding Danish Nazir ECCV2024 Image
ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck Chia-Hao Kao for MLLM PrePrint'24 Image
Reconstruction-free Image Compression for Machine Vision via Knowledge Transfer Hanyue Tu Image
Masked Feature Compression for Object Detection Chengjie Dai mathematics2024 Image
Texture-guided Coding for Deep Features Lei Xiong PrePrint'24 Image
Split Computing With Scalable Feature Compression for Visual Analytics on the Edge Zhongzheng Yuan TMM 2024 Image
Hierarchical Image Feature Compression for Machines via Feature Sparsity Learning Ding Ding IEEE Signal Processing 2024 Image
var-feat-comp Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems Md Adnan Faisal Hossain variable-rate ICME workshop2023 Image Code
NEC Neural Embedding Compression For Efficient Multi-Task Earth Observation Modelling Carlos Gomes multi-task PrePrint'24 Earth Observation Data
Neural Rate Estimator and Unsupervised Learning for Efficient Distributed Image Analytics in Split-DNN Models Nilesh Ahuja CVPR2023 Image Code
var-feat-comp Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems Md Adnan Faisal Hossain variable-rate ICME workshop2023 Image Code
Scalable Feature Compression for Edge-Assisted Object Detection Over Time-Varying Networks Zhongzheng Yuan MLSys workshop2023 Image
End-to-End Learnable Multi-Scale Feature Compression for VCM Yeongwoong Kim TCSVT2023 Video
Prompt-ICM Prompt-ICM: A Unified Framework towards Image Coding for Machines with Task-driven Prompts Ruoyu Feng & Jinming Liu PrePrint'23 Image
Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach Sha Guo ACMMM2023 Image
Semantic Segmentation In Learned Compressed Domain Jinming Liu PCS2022(Best Paper Award Finalists) Image
edge-cloud-rac Efficient Feature Compression for Edge-Cloud Systems Zhihao Duan PCS2022(Best Paper Award Finalists) Image Code
Omni-ICM Image Coding for Machines with Omnipotent Feature Learning Ruoyu Feng ECCV2022 Image
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing Parual Datta variable-rate ICPR2022 Image
Improving Multiple Machine Vision Tasks in the Compressed Domain Jinming Liu ICPR2022 Image
Learning from the CNN-based Compressed Domain Zhenzhen Wang WACV2022 Image
Supervised Compression for Resource-Constrained Edge Computing Systems Yoshitomo Matsubara WACV2022 Image Code
Bridging the Gap Between Image Coding for Machines and Humans Nam Le ICIP2022 Image
Enhancing Image Coding for Machines with Compressed Feature Residuals Joni Seppälä ISM2021 Image
Learning in Compressed Domain for Faster Machine Vision Tasks Jinming Liu VCIP2021 Image
Learning in the Frequency Domain Kai Xu CVPR2020 Image Code
Lossy Intermediate Deep Learning Feature Compression and Evaluation Zhuo Chen ACMMM2019 Image
Toward Intelligent Sensing: Intermediate Deep Feature Compression Zhuo Chen TIP2019 Image
End-to-End Optimized ROI Image Compression Chunlei Cai TIP2019 Image
Deep Feature Compression for Collaborative Object Detection Hyomin Choi ICIP2018 Image
Near-Lossless Deep Feature Compression for Collaborative Intelligence Hyomin Choi MMSP2018 Image

Joint-Human-and-Machine-Vision

Model Paper First Author Note Venue Data Project
A Unified Image Compression Method for Human Perception and Multiple Vision Tasks Sha Guo and Lin Sui multi-task ECCV2024 Image
Rate-distortion cognitive controllable versatile neural image compression Jinming Liu ECCV2024 Image
Adapt-ICMH Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation Han Li Adapter ECCV2024 Image Code
Machine Perception-Driven Facial Image Compression: A Layered Generative Approach Yuefeng Zhang TCSVT2024 Image
Towards Task-Compatible Compressible Representations Anderson de Andrade ICME Workshop2024 Image
Scalable Image Coding for Humans and Machines Using Feature Fusion Network Takahiro Shindo PrePrint'24 Image
Scalable Human-Machine Point Cloud Compression Mateen Ulhaq PrePrint'24 Pointcloud
GIT-SSIC Semantically Structured Image Compression via Irregular Group-Based Decoupling Ruoyu Feng and Yixin Gao ICCV2023 Image Code
Adaptive Human-Centric Video Compression for Humans and Machines Wei Jiang VQ CVPR workshop 2023 Video
Learned point cloud compression for classification Mateen Ulhaq MMSP2023 Pointcloud
Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines Ezgi Ozyılkan DCC2023 Image
Learned Scalable Video Coding For Humans and Machines Hadi Hadizadeh PrePrint'23 Video
Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision Qi Mao TIP2023 Image
ICMH-Net ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision Lei Liu ACM MM2023 Image
DeepSVC DeepSVC: Deep Scalable Video Coding for Both Machine and Human Vision Hongbin Lin ACM MM2023 Video
Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception Yudong Mao ACM MM2023 Image
Sketch Assisted Face Image Coding for Human and Machine Vision: A Joint Training Approach Xin Fang TCSVT2023 Image
Base Layer Efficiency in Scalable Human-Machine Coding Yalda Foroutan ICIP2023 Image
Scalable Image Coding for Humans and Machines Hyomin Choi TIP2022 Image
HMFVC: A Human-Machine Friendly Video Compression Scheme Zhimeng Huang TCSVT2022 Video
Bridging the gap between image coding for machines and humans Nam Le ICIP2022 Image
Towards End-to-End Image Compression and Analysis with Transformers Yuanchao Bai & Xu Yang AAAI2022 Image Code
Learned Image Compression for Machine Perception Felipe Codevilla PrePrint'21 Image
Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations Kang Liu IJCV2021 Image
Towards Analysis-Friendly Face Representation With Scalable Feature and Texture Compression Shurun Wang TMM2021 Image
SSSIC SSSIC: Semantics-to-Signal Scalable Image Coding With Learned Structural Representations Ning Yan TIP2021 Image
Towards coding for human and machine vision: Scalable face image coding Shuai Yang TMM2021 Image
Semantic Scalable Image Compression with Cross-Layer Priors Hanyue Tu ACM MM2021 Image
Latent-space scalability for multi-task collaborative intelligence Hyomin Choi ICIP2021 Image
SSIC Semantic Structured Image Coding Framework for Multiple Intelligent Applications Simeng Sun TCSVT2020 Image
Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach Yueyu Hu ICME2020 Image Project

Vision Model Token Compression

Models Paper First Author Note Venue Data Project
Towards Semantic Equivalence of Tokenization in Multimodal LLM Shengqiong Wu Tokenizer PrePrint'24 Image
PruMerge LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models Yuzhang Shang VL-conncetor PrePrint'24 Image & Video Code
AVG-LLaVA AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity Zhibin Lan VL-conncetor PrePrint'24 Image Code
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information Yi Chen VL-conncetor PrePrint'24 Image
HiRED HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Kazi Hasan Ibn Arif VL-conncetor PrePrint'24 High Resolution Image
DeCo DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models Linli Yao VL-conncetor PrePrint'24 Image Code
CrossGET CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers Dachuan Shi Tokenizer ICML2024 Image Code
DocPedia DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding Hao Feng Tokenizer PrePrint'23 Document
CrossGET CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers Dachuan Shi ICML2024 Code
Honeybee Honeybee: Locality-enhanced Projector for Multimodal LLM Junbum Cha, Wooyoung Kang and Jonghwan Mun VL-conncetor CVPR2024 Highlight Code
InternVL2 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Zhe Chen VL-conncetor PrePrint'24 Code
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs Dingjie Song VL-conncetor PrePrint'24
Matryoshka Multimodal Models Mu Cai VL-conncetor PrePrint'24 Code
MM1 MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Brandon McKinzie Ablations on each element ECCV2024 Image
LDPv2 MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Xiangxiang Chu VL-conncetor PrePrint'24 Image Code
mPLUG-DocOwl2 High-resolution Compressing for OCR-free Multi-page Document Understanding Anwen Hu VL-conncetor PrePrint'24 Document Code
TextHawk TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models Ya-Qi Yu VL-conncetor PrePrint'24 Image Code
TextMonkey TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document Yuliang Liu VL-conncetor PrePrint'24 Document Code
TokenCorrCompressor Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding Renshan Zhang VL-conncetor PrePrint'24 Image & Video Code
ToMe Token Merging: Your ViT But Faster Daniel Bolya Tokenizer ICLR2023 notable top 5% Image & Video Code
TokenPacker TokenPacker: Efficient Visual Projector for Multimodal LLM Wentong Li VL-conncetor PrePrint'24 Image Code
VidToMe VidToMe: Video Token Merging for Zero-Shot Video Editing Xirui Li Tokenizer CVPR2024 Video Code
VoCo-LLaMA VoCo-LLaMA: Towards Vision Compression with Large Language Models Xubing Ye VL-conncetor PrePrint'24 Image & Video Code
FastV An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Xubing Ye VL-conncetor ECCV2024 oral Image Code
TempMe TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval Leqi Shen Tokenizer PrePrint'24 Image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published