Skip to content

Commit

Permalink
update institution
Browse files Browse the repository at this point in the history
  • Loading branch information
mzhaoshuai committed Jan 17, 2024
1 parent 0f3e217 commit 38929e4
Showing 1 changed file with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@ layout: home
<div align="center">
# Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

### [Shuai Zhao](https://github.com/mzhaoshuai)<sup>1,2</sup>, [Xiaohan Wang](https://scholar.google.com/citations?user=iGA10XoAAAAJ&hl=en-US)<sup>1</sup>, [Linchao Zhu](http://ffmpbgrnn.github.io/)<sup>1</sup>, [Yi Yang](https://scholar.google.com/citations?user=RMSuNFwAAAAJ&hl=en)<sup>1</sup>
### <sup>1</sup> ReLER Lab, CCAI, Zhejiang University, <sup>2</sup> Baidu Inc.
### [<ins>paper</ins>](https://arxiv.org/abs/2305.18010)
<!-- ### [<ins>paper</ins>]() &nbsp; [<ins>code</ins>]() -->
### [Shuai Zhao](https://github.com/mzhaoshuai)<sup>1,3</sup>, [Xiaohan Wang](https://scholar.google.com/citations?user=iGA10XoAAAAJ&hl=en-US)<sup>2</sup>, [Linchao Zhu](http://ffmpbgrnn.github.io/)<sup>2</sup>, [Yi Yang](https://scholar.google.com/citations?user=RMSuNFwAAAAJ&hl=en)<sup>2</sup>
### <sup>1</sup> ReLER Lab, AAII, University of Technology Sydney, <sup>1</sup> ReLER Lab, CCAI, Zhejiang University, <sup>3</sup> Baidu Inc.
<!-- ### [<ins>paper</ins>](https://openreview.net/forum?id=kIP0duasBb) -->
### [<ins>paper</ins>](https://openreview.net/forum?id=kIP0duasBb) &nbsp; [<ins>code</ins>](https://github.com/mzhaoshuai/RLCF)
</div>

<div align="justify">
<p><strong><em>Abstract</em></strong>:
One fascinating aspect of pre-trained vision-language models~(VLMs) learning under language supervision is their impressive zero-shot generalization capability.
However, this ability is hindered by distribution shifts between the training and testing data.
Expand All @@ -22,12 +23,13 @@ In this work, we propose TTA with feedback to rectify the model output and preve
Specifically, a CLIP model is adopted as the reward model during TTA and provides feedback for the VLM.
Given a single test sample,
the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.
The proposed \textit{reinforcement learning with CLIP feedback~(RLCF)} framework is highly flexible and universal.
The proposed <strong>reinforcement learning with CLIP feedback~(RLCF)</strong> framework is highly flexible and universal.
Beyond the classification task, with task-specific sampling strategies and a proper reward baseline choice, RLCF can be easily extended to not only discrimination tasks like retrieval but also generalization tasks like image captioning,
improving the zero-shot generalization capacity of VLMs.
According to the characteristics of these VL tasks, we build different fully TTA pipelines with RLCF to improve the zero-shot generalization ability of various VLMs.
Extensive experiments along with promising
empirical results demonstrate the effectiveness of RLCF.
</div>
<br /><br /></p>


Expand Down

0 comments on commit 38929e4

Please sign in to comment.