Evalution for StereoSet (Bias) #34

roskoN · 2022-08-30T07:57:45Z

At Alexander Thamm, we worked on an evaluation procedure for bias of language models.

It is based on StereoSet. Originally, the benchmark is for English only. However, we translated the dataset into German using automatic translation (Amazon Translate). We did a comparative study with multilingual models on both the English and German versions of StereoSet and confirmed there are no big differences. Hence, we can use it for also evaluating German LMs.

Would it make sense for you that we integrate it into the LM evaluation harness? I will be happy to open a PR, but before doing that, I wanted to align with you.

What do you think? Do you have any questions or comments?

I also had a discussion with @mali-git, he is on board with the idea.

Thanks,
Rosko

malteos · 2022-08-30T08:13:56Z

Hey @roskoN, great initiative! A bias task would be a valuable addition to the framework. Let me know if you need any help. I've written this little guide on how to add new tasks: #2

malteos · 2022-11-21T14:53:56Z

Added via #35

roskoN added the task label Aug 30, 2022

roskoN self-assigned this Aug 30, 2022

malteos closed this as completed Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evalution for StereoSet (Bias) #34

Evalution for StereoSet (Bias) #34

roskoN commented Aug 30, 2022

malteos commented Aug 30, 2022

malteos commented Nov 21, 2022

Evalution for StereoSet (Bias) #34

Evalution for StereoSet (Bias) #34

Comments

roskoN commented Aug 30, 2022

malteos commented Aug 30, 2022

malteos commented Nov 21, 2022