Evaluation Method for BIRD Dataset [Enhancement] #159

lucaordronneau · 2024-08-08T14:39:10Z

Hello,

I encountered an improvement opportunity during the evaluation process for the BIRD dataset. The prediction below is marked as incorrect by the evaluation method, but the only difference is the order of the elements.

The evaluation method uses strict equality. This occurs in the file bird/llm/src/evaluation.py on line 26.

…libabaResearch#159

bird-bench · 2024-08-27T17:00:13Z

@lucaordronneau Thanks for interests in our work. Yes, the EX is more strict, we considered the returned orders should also be one of user requirements. Just imagine the agent return a long list with messy orders, which make users very annoying. However, if you dis consider this, you can try out our new metrics for beta testing SOFT-FT, which contains detailed elaborations here:
soft-f1. Thanks.

2514387775 · 2024-11-12T06:19:19Z

Project Name: China Urban Bird Dataset

Project Description: We are compiling a dataset of birdwatching records from citizens across various cities in China for scientific research related to bird conservation. We welcome data sources from any channel, including but not limited to structured species distribution databases, citizen science projects, social media data, and historical literature data.

Data Requirements Details: The data sources we need should at least include species name, geographic information (precise coordinates or specific locations), and observation dates. It would be best if the data also included specific information such as the species' Latin name, Chinese name, and English name. We require data sources that are within the scope of China and at the urban scale.

Contribution Guidelines: We hope for submissions in Excel or CSV format, and other table formats compatible with the Windows system are also acceptable. We prefer data to be shared under a free license agreement, but we also support acquiring data through compensated purchase arrangements.

Contact Information: Please submit to the email address [email protected].

lucaordronneau pushed a commit to lucaordronneau/DAMO-ConvAI that referenced this issue Aug 14, 2024

[feat] Sort SQL result tuples for fair comparison during evaluation A…

6d9dc3a

…libabaResearch#159

lucaordronneau mentioned this issue Aug 14, 2024

[feat] Sort SQL result tuples for fair comparison during evaluation #159 #162

Closed

lucaordronneau added a commit to lucaordronneau/DAMO-ConvAI that referenced this issue Aug 19, 2024

[feat] Sort SQL result tuples for fair comparison during evaluation A…

bcbfb0c

…libabaResearch#159

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Method for BIRD Dataset [Enhancement] #159

Evaluation Method for BIRD Dataset [Enhancement] #159

lucaordronneau commented Aug 8, 2024

bird-bench commented Aug 27, 2024

2514387775 commented Nov 12, 2024

Evaluation Method for BIRD Dataset [Enhancement] #159

Evaluation Method for BIRD Dataset [Enhancement] #159

Comments

lucaordronneau commented Aug 8, 2024

bird-bench commented Aug 27, 2024

2514387775 commented Nov 12, 2024