-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create dataset loader for XQuAD-R #593
Comments
#self-assign |
From the homepage:
The additional fields are given like this (download english subset here):
Given that XQuAD dataloader is already implemented and this one being focus on the retrieval, thus the task should only be Text Retrieval and the new dataloader should focus on those additional fields IMO. How should I approach the mapping to pairs schema? I am also quite confused why text retrieval task is using that schema... |
I agree with you, @akhdanfadh. I've updated the issue ticket and the datasheet accordingly.
Because the existing text retrieval datasets commonly have a pair of texts and a label determining whether the pair is a positive or a negative pair. However, XQuAD-R is a QA retrieval task, which is a bit different. Based on Section 3.1 of the paper:
And a data instance example of the dataset:
Because of this, in XQuAD-R's dataloader, it seems that for each data instance:
In this case, it looks like the CMIIW. |
In that case, instead of using Aight, implementing this now by adding |
I think the |
@sabilmakbar I think that will be against the main idea of the dataset as it is more about the retrieval part. This dataset is also for benchmark, not about the ground truth
@holylovenia also quoted from the paper. Based on this, we can think of the dataset as a "multiple choice" with the choices being all the sentences. Though it's not.
Overall, we can still put the ground truth answer |
Dataloader name:
xquadr/xquadr.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?xquadr
The text was updated successfully, but these errors were encountered: