2S-Lab, Nanyang Technological University
The code repository for "Learning without Forgetting for Vision-Language Models" in PyTorch. If you use any content of this repo for your work, please cite the following bib entry:
@article{zhou2023learning,
title={Learning without Forgetting for Vision-Language Models},
author={Da-Wei Zhou and Yuanhan Zhang and Jingyi Ning and Han-Jia Ye and De-Chuan Zhan and Ziwei Liu},
journal={arXiv preprint arXiv:2305.19270},
year={2023}
}
[10/2024] Code has been released.
[05/2023] arXiv paper has been released.
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones. Traditional CIL models are trained from scratch to continually acquire knowledge as data evolves.While traditional CIL methods focus on visual information to grasp core features, recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations with the aid of textual information. However, when continually trained with new classes, VLMs often suffer from catastrophic forgetting of former knowledge. Applying VLMs to CIL poses two major challenges: 1) how to adapt the model without forgetting; and 2) how to make full use of the multi-modal information. To this end, we propose PROjectiOn Fusion (PROOF) that enables VLMs to learn without forgetting. To handle the first challenge, we propose training task-specific projections based on the frozen image/text encoders. When facing new tasks, new projections are expanded, and former projections are fixed, alleviating the forgetting of old concepts. For the second challenge, we propose the fusion module to better utilize the cross-modality information. By jointly adjusting visual and textual features, the model can capture better task-specific semantic information that facilitates recognition. Extensive experiments on nine benchmark datasets with various continual learning scenarios and various VLMs validate that PROOF achieves state-of-the-art performance.
Environment
Dataset
We provide the processed datasets as follows:
- CIFAR100: will be automatically downloaded by the code.
- CUB200: Google Drive: link or OneDrive link
- ImageNet-R: Google Drive: link or Onedrive: link
- ObjectNet: Onedrive: link You can also refer to the filelist and processing code if the file is too large to download.
- Cars: Google Drive: link or OneDrive: link
- UCF: Google Drive: link or OneDrive: link
- Aircraft: Google Drive: link or OneDrive: link
- Food: Google Drive: link or OneDrive: link
- SUN: OneDrive: link
These subsets are sampled from the original datasets. Please note that I do not have the right to distribute these datasets. If the distribution violates the license, I shall provide the filenames instead.
You need to modify the path of the datasets in ./utils/data.py
according to your own path.
To prepare your JSON files, refer to the settings in the exps
folder and run the following command. All main experiments from the paper are already provided in the exps
folder, you can simply execute them to reproduce the results found in the logs
folder.
python main.py --config ./exps/[configname].json
This repo is based on CIL_Survey and PyCIL.
If you have any questions, please contact me via email or open an issue.