Great work! And sharing our NaturalBench #2

linzhiqiu · 2024-10-24T19:23:04Z

I am Zhiqiu Lin, a final-year PhD student at Carnegie Mellon University working with Prof. Deva Ramanan. We found your work on NeurIPS'24 fascinating!

I wanted to share NaturalBench (NeurIPS'24 D&B), a collaborative project between CMU and the University of Washington, which might interest you:

NaturalBench (https://linzhiqiu.github.io/papers/naturalbench/) is a vision-centric benchmark that challenges vision-language models with pairs of simple questions about natural imagery. Unlike prior VQA benchmarks (like MME and ScienceQA), which blind language models (e.g., GPT-3.5) can solve, NaturalBench ensures such shortcuts won’t work. We evaluated 53 state-of-the-art models, and even top models like GPT-4o and Qwen2-VL fall 50%-70% short of human accuracy (90%+), revealing significant room for improvement.

We also found that current models show strong answer biases, such as favoring “Yes” over “No” regardless of the input. Correcting these biases can boost performance by 2-3x, even for GPT-4o, making NaturalBench a valuable testbed for future debiasing techniques.

Check out my Twitter post about it here: https://x.com/ZhiqiuLin/status/1848454555341885808.

🚀 Start using NaturalBench: https://github.com/Baiqi-Li/NaturalBench

Best,
Zhiqiu

xing0047 · 2024-10-24T22:51:34Z

Hi Zhiqiu,

Thanks for your interest in our work and the sharing on NaturalBench. ❤

We also observe such deficiencies in existing video-audio-language models (https://github.com/DAMO-NLP-SG/CMM), which aligns with findings in NaturalBench. We believe ensuring MLLMs not to follow shortcuts is critical for developing vision-centric models.

Will leave this issue open for now to involve others for discussions. 😊

xing0047 closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Great work! And sharing our NaturalBench #2

Great work! And sharing our NaturalBench #2

linzhiqiu commented Oct 24, 2024

xing0047 commented Oct 24, 2024

Great work! And sharing our NaturalBench #2

Great work! And sharing our NaturalBench #2

Comments

linzhiqiu commented Oct 24, 2024

xing0047 commented Oct 24, 2024