To be considered as a co-author, 20 contribution points are required.
Note: The purpose of the point system is not to barrier collaboration, but to reward rare and high-quality dataset entries. We might adjust the point requirement lower to accommodate more co-authorship if needed.
Contribution type | Demand | Points | Max points | Job description |
---|---|---|---|---|
Public Datasheet Submission | As many as possible | 2+bonus | 6 | Submit public datasheet via jotform |
Private Datasheet Submission | As many as possible | 1 | Submit private datasheet via jotform | |
Open Access to Private Dataset | As many as possible | 4+bonus | 10 for the high-quality | Only private dataset owners can do this. Upload the data in a public repository and submit the datasheet in jotform. |
Dataloader Implementation | As many as possible | 3 | 6 for the hard one | Implement dataloader based on the respective dataset's schema and task. |
Submitting a public dataset via jotform is worth a default score of +2. Bonus is applied based on the following conditions:
- modality: +1 pts for speech/vision, +2 pts for others
- language rarity based on https://microsoft.github.io/linguisticdiversity/assets/lang2tax.txt:
- +0 pts for languages in level 3 or above
- +1 pts for languages in level 1 and 2
- +2 pts for languages in level 0 or languages are not mentioned in the list.
Submitting a private dataset via jotform is worth a default score of +1.
Providing open access to the private datasets will be granted +4 points, with a bonus applied based on the following conditions:
- modality: +1 pts for speech/vision, +2 pts for others
- language rarity based on https://microsoft.github.io/linguisticdiversity/assets/lang2tax.txt:
- +0 pts for languages in level 3 or above
- +1 pts for languages in level 1 and 2
- +2 pts for languages in level 0 or languages are not mentioned in the list.
Implementing any data loader is granted +3 pts unless otherwise specified on the GitHub issue. More details here.