Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is my reproduction result correct? #157

Open
HeegyuKim opened this issue Aug 2, 2024 · 0 comments
Open

Is my reproduction result correct? #157

HeegyuKim opened this issue Aug 2, 2024 · 0 comments

Comments

@HeegyuKim
Copy link

Hello, Bird Team. Thank you for sharing some nice work!

I downloaded your repository and database files and evaluated two prediction files of GPT-4.
I would be appreciated to know whether my reproduction result is correct or not.

Here are my results.

bird/llm/exp_result/turbo_output/predict_dev.json

                     simple               moderate             challenging          total               
count                925                  464                  145                  1534                
======================================    ACCURACY    =====================================
accuracy             31.57                10.13                6.90                 22.75               
===========================================================================================

Three instances have been timeout.

bird/llm/exp_result/turbo_output_kg/predict_dev.json

                     simple               moderate             challenging          total               
count                925                  464                  145                  1534                
======================================    ACCURACY    =====================================
accuracy             46.70                20.47                15.86                35.85               
===========================================================================================

Three instances have been timeout too.

meta_time_out is set to 600 in my environment.

I think this result is different from the GPT-4 (GPT-4-32k) result in Table 2 of the BIRD paper due to a different model, as the directory names turbo_output indicate.
Paper Result (GPT-4 ICL)

  • without KG: 30.90
  • with KG 46.35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant