Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for Okapi m-MMLU #476

Closed
SamuelCahyawijaya opened this issue Mar 3, 2024 · 4 comments · Fixed by #653
Closed

Create dataset loader for Okapi m-MMLU #476

SamuelCahyawijaya opened this issue Mar 3, 2024 · 4 comments · Fixed by #653
Assignees
Labels
bonus +1 pr-ready A PR that closes this issue is Ready to be reviewed top-priority Needs to get done ASAP for the experiments

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: okapi_m_mmlu/okapi_m_mmlu.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?okapi_m_mmlu

Dataset okapi_m_mmlu
Description m-MMLU is a multilingual version of MMLU, a benchmark that measured a text model’s multitask accuracy. The test covers 57 tasks including elementary mathematics, history, computer science, law, and more.
Subsets -
Languages ind, vie
Tasks Commonsense Reasoning
License Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0)
Homepage http://nlp.uoregon.edu/download/okapi-eval/datasets/
HF URL https://huggingface.co/datasets/jon-tow/okapi_mmlu
Paper URL https://arxiv.org/abs/2307.16039
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Mar 3, 2024
@tellarin
Copy link
Collaborator

tellarin commented Mar 3, 2024

#self-assign

@holylovenia holylovenia added bonus +1 top-priority Needs to get done ASAP for the experiments labels Mar 12, 2024
@tellarin
Copy link
Collaborator

Back working on this. Sorry for the delay.

@holylovenia
Copy link
Contributor

Back working on this. Sorry for the delay.

Sure, please let us know if you need any help, @tellarin!

@holylovenia
Copy link
Contributor

Hi @tellarin, thanks for taking this PR. Just a heads up, due to the delay, I would like to let @SamuelCahyawijaya take over this issue if there's no update until Tuesday, 16 April 2024 EoD AoE (23:59 UTC-12).

@holylovenia holylovenia added the pr-ready A PR that closes this issue is Ready to be reviewed label Apr 22, 2024
sabilmakbar pushed a commit that referenced this issue May 1, 2024
* add m-mmlu dataloader

* Update okapi_m_mmlu.py

applying some formatter suggestions to this codebase

---------

Co-authored-by: Samuel Cahyawijaya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bonus +1 pr-ready A PR that closes this issue is Ready to be reviewed top-priority Needs to get done ASAP for the experiments
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants