Create dataset loader for MaXM #425

SamuelCahyawijaya · 2024-02-13T02:26:46Z

Dataloader name: maxm/maxm.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?maxm

Dataset	maxm
Description	MaXM, a test-only VQA benchmark in 7 diverse languages, including Thai. The dataset is generated by first applying a translation-based framework to mVQA and then applying framework to the multilingual captions in the Crossmodal-3600 dataset.
Subsets	MaXM v1 -th
Languages	tha
Tasks	Question Answering
License	Other (other)
Homepage	https://github.com/google-research-datasets/maxm
HF URL	-
Paper URL	https://aclanthology.org/2023.findings-emnlp.176

The text was updated successfully, but these errors were encountered:

akhdanfadh · 2024-02-15T17:10:44Z

Hi, the dataset is organized as follows:

dataset                 str: dataset name
version                 str: dataset version
split                   str: language ID
annotations             List of image-question-answers triplets, each of which is
-- image_id             str: image ID
-- image_url            str: image URL
-- qa_pairs             List of question-answer pairs, each of which is
---- question_id        str: question ID
---- question           str: raw question
---- answers            List of str: ground-truth answers
---- processed_answers  List of str: processed ground-truth answers. 16 tokenized answers.
---- is_collection      bool: "true" if the question is of the "Collection" type; "false" otherwise..

In question answering schema, the features are:

id             (str)
question_id    (str)
document_id    (str)
question       (str)
type           (str)
choices        (list[str])
context        (str)
answer         (list[str])
meta           (dict[Any])

Should I assign is_collection to type, context, or inside meta?
Also, should I put image_id or image_url for the document_id?

github-actions · 2024-03-01T02:02:25Z

Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

akhdanfadh · 2024-03-01T02:41:00Z

Hmm, I think I need to mention for faster response @sabilmakbar @holylovenia

holylovenia · 2024-03-18T05:31:50Z

I didn't realize I missed so many mentions from you. 😭 Sorry!!

Could you please use Tasks.VISUAL_QUESTION_ANSWERING? It employs the imqa schema.

Should I assign is_collection to type, context, or inside meta?

Inside meta would be perfect. type is typically open-ended, multiple-choice, extractive, abstractive, etc.

Also, should I put image_id or image_url for the document_id?

document_id is related to the context (if there is).

SamuelCahyawijaya added this to SEACrowd Data Hub Feb 13, 2024

SamuelCahyawijaya converted this from a draft issue Feb 13, 2024

github-actions bot assigned akhdanfadh Feb 15, 2024

github-actions bot added the staled-issue label Mar 1, 2024

github-actions bot removed the staled-issue label Mar 2, 2024

github-actions bot added the staled-issue label Mar 16, 2024

holylovenia removed the staled-issue label Mar 18, 2024

akhdanfadh added the in-progress Assignee has given confirmation on progress and ETA label Mar 29, 2024

This was referenced Mar 29, 2024

Closes #425 | Add Dataloader MaXM #553

Closed

Closes #425 | Add Dataloader MaXM #554

Merged

akhdanfadh added pr-ready A PR that closes this issue is Ready to be reviewed and removed in-progress Assignee has given confirmation on progress and ETA labels Mar 29, 2024

muhammadravi251001 closed this as completed in #554 May 14, 2024

muhammadravi251001 pushed a commit that referenced this issue May 14, 2024

Closes #425 | Add Dataloader MaXM

bb4579e

github-project-automation bot moved this to Done in SEACrowd Data Hub May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create dataset loader for MaXM #425

Create dataset loader for MaXM #425

SamuelCahyawijaya commented Feb 13, 2024

akhdanfadh commented Feb 15, 2024

github-actions bot commented Mar 1, 2024

akhdanfadh commented Mar 1, 2024

holylovenia commented Mar 18, 2024

Create dataset loader for MaXM #425

Create dataset loader for MaXM #425

Comments

SamuelCahyawijaya commented Feb 13, 2024

akhdanfadh commented Feb 15, 2024

github-actions bot commented Mar 1, 2024

akhdanfadh commented Mar 1, 2024

holylovenia commented Mar 18, 2024