Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Evals for HallusionBench #31

Open
wants to merge 2 commits into
base: maya_eval
Choose a base branch
from

Conversation

chiral-carbon
Copy link

Adds:

  • download instructions for dataset in docs/Evaluation.md
  • script to convert HallusionBench.json to llava format for inference
  • custom model_vqa_hallusionbench.py script for inference and generating results
  • script to convert result file back to .json format to evaluate with official evaluation scripts from HallusionBench GitHub repo

The pipeline works correctly, but the custom vqa script has issues.


TODO:

  • Investigate and fix model_vqa_hallusionbench.py as it fails to generate response correctly, hence defaulting to model prediction category "2" for all inputs and accuracy = 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant