Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for OpenSpeech Dataset V1 by Wang #714

Open
SamuelCahyawijaya opened this issue Jul 30, 2024 · 0 comments
Open

Create dataset loader for OpenSpeech Dataset V1 by Wang #714

SamuelCahyawijaya opened this issue Jul 30, 2024 · 0 comments

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: openspeech_v1/openspeech_v1.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?openspeech_v1

Dataset openspeech_v1
Description The OpenSpeech Dataset V1 by Wang: Data Market is a collection of speech data designed to facilitate research and development in the field of speech processing. This dataset comprises 10 hours of diverse audio recordings (8450 sentences) contributed by a collaborative effort of 1077 users. The dataset encompasses a wide range of sentences, capturing various linguistic nuances and acoustic environments. Contributors were encouraged to provide diverse speech samples, resulting in a rich and comprehensive dataset suitable for tasks such as speech recognition, language modeling, and speaker identification. A registration (email, password) is needed to download this free dataset.
Subsets -
Languages tha
Tasks Spoken Language Identification, Language Modeling
License Creative Commons Attribution Share Alike 4.0 (cc-by-sa-4.0)
Homepage https://www.wang.in.th/dataset/654dfdbb6147c33fbf172957
HF URL -
Paper URL -
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant