Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for Onto4All #536

Open
SamuelCahyawijaya opened this issue Mar 18, 2024 · 2 comments · May be fixed by #635
Open

Create dataset loader for Onto4All #536

SamuelCahyawijaya opened this issue Mar 18, 2024 · 2 comments · May be fixed by #635
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: onto4all/onto4all.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?onto4all

Dataset onto4all
Description Onto4All is a subsample of other open source performant conversational datasets. We start with a carefully curated subset of the OpenHermes-2.5-Viet dataset, co-created by @qnguyen3 and @Teknium. This dataset is specifically designed to support the training and evaluation of Multilingual language models, such as Vistral-7B-chat and VinaLlama-7B-chat, and is derived from our Supervised Fine-Tuning (SFT) data. We have included Vietnamese here, but will add more languages.
Subsets -
Languages vie
Tasks Question Answering
License Creative Commons Zero v1.0 Universal (cc0-1.0)
Homepage https://huggingface.co/datasets/ontocord/onto4all
HF URL https://huggingface.co/datasets/ontocord/onto4all
Paper URL https://huggingface.co/datasets/ontocord/onto4all
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Mar 18, 2024
@bp-high
Copy link
Contributor

bp-high commented Mar 19, 2024

#self-assign

@bp-high bp-high removed their assignment Mar 26, 2024
@patrickamadeus
Copy link
Collaborator

#self-assign

@patrickamadeus patrickamadeus linked a pull request Apr 9, 2024 that will close this issue
7 tasks
@sabilmakbar sabilmakbar added pr-ready A PR that closes this issue is Ready to be reviewed and removed staled-issue labels May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants