Learning without Forgetting for Vision-Language Models

Da-Wei Zhou¹ Yuanhan Zhang² Yan Wang¹ Jingyi Ning¹ Han-Jia Ye¹ De-Chuan Zhan¹ Ziwei Liu²

¹School of Artificial Intelligence, State Key Laboratory for Novel Software Technology, Nanjing University
²S-Lab, Nanyang Technological University

The code repository for "Learning without Forgetting for Vision-Language Models" in PyTorch. If you use any content of this repo for your work, please cite the following bib entry:

@article{zhou2023learning,
  title={Learning without Forgetting for Vision-Language Models},
  author={Da-Wei Zhou and Yuanhan Zhang and Jingyi Ning and Han-Jia Ye and De-Chuan Zhan and Ziwei Liu},
  journal={arXiv preprint arXiv:2305.19270},
  year={2023}
}

📢 Updates

[10/2024] Code has been released.

[05/2023] arXiv paper has been released.

📝 Introduction

Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones. Traditional CIL models are trained from scratch to continually acquire knowledge as data evolves.While traditional CIL methods focus on visual information to grasp core features, recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations with the aid of textual information. However, when continually trained with new classes, VLMs often suffer from catastrophic forgetting of former knowledge. Applying VLMs to CIL poses two major challenges: 1) how to adapt the model without forgetting; and 2) how to make full use of the multi-modal information. To this end, we propose PROjectiOn Fusion (PROOF) that enables VLMs to learn without forgetting. To handle the first challenge, we propose training task-specific projections based on the frozen image/text encoders. When facing new tasks, new projections are expanded, and former projections are fixed, alleviating the forgetting of old concepts. For the second challenge, we propose the fusion module to better utilize the cross-modality information. By jointly adjusting visual and textual features, the model can capture better task-specific semantic information that facilitates recognition. Extensive experiments on nine benchmark datasets with various continual learning scenarios and various VLMs validate that PROOF achieves state-of-the-art performance.

🔧 Requirements

Environment

1 torch 1.11.0

2 torchvision 0.12.0

3 open-clip 2.17.1

Dataset

We provide the processed datasets as follows:

CIFAR100: will be automatically downloaded by the code.
CUB200: Google Drive: link or OneDrive link
ImageNet-R: Google Drive: link or Onedrive: link
ObjectNet: Onedrive: link You can also refer to the filelist and processing code if the file is too large to download.
Cars: Google Drive: link or OneDrive: link
UCF: Google Drive: link or OneDrive: link
Aircraft: Google Drive: link or OneDrive: link
Food: Google Drive: link or OneDrive: link
SUN: OneDrive: link

These subsets are sampled from the original datasets. Please note that I do not have the right to distribute these datasets. If the distribution violates the license, I shall provide the filenames instead.

You need to modify the path of the datasets in ./utils/data.py according to your own path.

💡 Running scripts

To prepare your JSON files, refer to the settings in the exps folder and run the following command. All main experiments from the paper are already provided in the exps folder, you can simply execute them to reproduce the results found in the logs folder.

python main.py --config ./exps/[configname].json

🎈 Acknowledgement

This repo is based on CIL_Survey and PyCIL.

💭 Correspondence

If you have any questions, please contact me via email or open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
convs		convs
exps		exps
logs/proof		logs/proof
models		models
resources		resources
utils		utils
README.md		README.md
main.py		main.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning without Forgetting for Vision-Language Models

📢 Updates

📝 Introduction

🔧 Requirements

💡 Running scripts

🎈 Acknowledgement

💭 Correspondence

About

Releases

Packages

Contributors 2

Languages

zhoudw-zdw/PROOF

Folders and files

Latest commit

History

Repository files navigation

Learning without Forgetting for Vision-Language Models

📢 Updates

📝 Introduction

🔧 Requirements

💡 Running scripts

🎈 Acknowledgement

💭 Correspondence

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages