You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @Violettttee ,
You can try the AI2D_TEST_NO_MASK dataset we provided, which generally display better performance compared to AI2D_TEST due to the different setting. However, we still cannot reproduce the numbers reported by OpenAI or Anthropic.
您好~
想请问下你们对于openai和claude3.5在ai2d上特别高的分数有任何建议和想法吗?我这边修改姿势和prompt(添加cot)评测了gpt多次,都无法复现出0.942的超高分数。(加了cot后的最高分也就0.83),想请问你们对于这个gap有什么想法?(我看你们这边的ai2d的评测分数也没有任何高于0.9以上的,很好奇claude和gpt是怎么测出来将近满分的
The text was updated successfully, but these errors were encountered: