Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for Sarawak Malay #444

Closed
SamuelCahyawijaya opened this issue Feb 18, 2024 · 1 comment · Fixed by #458
Closed

Create dataset loader for Sarawak Malay #444

SamuelCahyawijaya opened this issue Feb 18, 2024 · 1 comment · Fixed by #458
Assignees
Labels
pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

Dataloader name: sarawak_malay/sarawak_malay.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?sarawak_malay

Dataset sarawak_malay
Description This is a Sarawak Malay conversation data for the purpose of speech technology research. At the moment, this is an experimental data and currently used for investigating speaker diarization. The data was collected by Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak. The data consists of 38 conversations that have been transcribed using Transcriber (see TextGrid folder), where each file contains two speakers. Each conversation was recorded by different individuals using microphones from mobile devices or laptops thus, different file formats were collected from the data collectors. All data was then standardized to mono, 16000Khz, wav format.
Subsets -
Languages zlm
Tasks Text-To-Speech Synthesis, Automatic Speech Recognition
License Creative Commons Zero v1.0 Universal (cc0-1.0)
Homepage https://github.com/sarahjuan/sarawakmalay
HF URL -
Paper URL -
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Feb 18, 2024
@djanibekov
Copy link
Contributor

#self-assign

@sabilmakbar sabilmakbar added the pr-ready A PR that closes this issue is Ready to be reviewed label Mar 3, 2024
yongzx added a commit that referenced this issue Mar 6, 2024
Closes #444  | Create dataset loader for Sarawak Malay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants