You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At Alexander Thamm, we worked on an evaluation procedure for bias of language models.
It is based on StereoSet. Originally, the benchmark is for English only. However, we translated the dataset into German using automatic translation (Amazon Translate). We did a comparative study with multilingual models on both the English and German versions of StereoSet and confirmed there are no big differences. Hence, we can use it for also evaluating German LMs.
Would it make sense for you that we integrate it into the LM evaluation harness? I will be happy to open a PR, but before doing that, I wanted to align with you.
What do you think? Do you have any questions or comments?
I also had a discussion with @mali-git, he is on board with the idea.
Thanks,
Rosko
The text was updated successfully, but these errors were encountered:
Hey @roskoN, great initiative! A bias task would be a valuable addition to the framework. Let me know if you need any help. I've written this little guide on how to add new tasks: #2
Hey @malteos, Hey @sasaadi
At Alexander Thamm, we worked on an evaluation procedure for bias of language models.
It is based on StereoSet. Originally, the benchmark is for English only. However, we translated the dataset into German using automatic translation (Amazon Translate). We did a comparative study with multilingual models on both the English and German versions of StereoSet and confirmed there are no big differences. Hence, we can use it for also evaluating German LMs.
Would it make sense for you that we integrate it into the LM evaluation harness? I will be happy to open a PR, but before doing that, I wanted to align with you.
What do you think? Do you have any questions or comments?
I also had a discussion with @mali-git, he is on board with the idea.
Thanks,
Rosko
The text was updated successfully, but these errors were encountered: