Create dataset loader for TotalDefMeme #355

SamuelCahyawijaya · 2024-01-22T06:53:43Z

Dataloader name: total_defense_meme/total_defense_meme.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?total_defense_meme

Dataset	total_defense_meme
Description	This is a large-scale multimodal and multi-attribute dataset containing memes about Singapore's Total Defence policy from different social media platforms. The type (Singaporean or generic), pillars (military, civil, economic, social, psychological, digital, others), topics and stances (against, neutral, supportive) of each meme are manually identified by annotators.
Subsets	-
Languages	eng
Tasks	Topic Classification, Stance Detection, Optical Character Recognition
License	Unknown (unknown)
Homepage	Image: https://drive.google.com/file/d/1oJIh4QQS3Idff2g6bZORstS5uBROjUUz/view, Annotations: https://gitlab.com/bottle_shop/meme/TotalDefMemes/-/tree/main
HF URL	-
Paper URL	https://arxiv.org/pdf/2305.17911.pdf

The text was updated successfully, but these errors were encountered:

TysonYu · 2024-02-02T13:45:26Z

#self-assign

github-actions · 2024-02-17T01:56:24Z

Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

akhdanfadh · 2024-03-30T10:33:11Z

My question is in the end.

Based on the paper, they obtained a dataset of 7200 images. Then, they filtered it into 5301 memes with 2893 SG-related and 2408 non-SG. Among those SG-related memes, they annotated 2513 images. Quoting them,

Pillars, Topics & Stances Annotation ..... The annotators will first assign the memes’ defence pillars: military, civil, economic, social, psychological, digital, or others. ..... Next, they annotate the relevant topic tags associated with the meme (i.e., nouns, pronouns, and phrases) in a free-text format. ..... Lastly, the annotators annotate the meme’s stances towards the assigned pillars: support, against, or neutral.

So, for example, an image will have annotations like this:

"Pillar_Stances": [
    {
        "img_4120.jpg": [
            [
                "Economic Defence",
                [
                    "Neutral",
                    "Neutral"
                ]
            ],
            [
                "Psychological Defence",
                [
                    "Against",
                    "Against"
                ]
            ]
        ]
    },
    ...
]
"Tags": [
    {
        "img_4120.jpg": [
            "Government",
            "HDB",
            "Gone",
            "Lease End",
            "Sad",
            "Disappear",
            "99-Years",
            "e-scooter law"
        ]
    },
    ...
]

Furthermore, they also provide the text in almost all images (7012 to be exact) with some OCR algorithm as follows:

"Text": [
    {
        "img_4120.jpg": "When a HDB flat finishes it's 99-year lease: This is so sad can we hit a pedestrian with escooters?"
    },
    {
        "img_1712.jpg": "News; The mystery Chinese Virus can only spread through human interaction Engineering Students:"
    },
    ...
]

I think this one will use general image_text seacrowd schema. My question is should I just implement the text OCR field and ignore the pillars? If so, then I can pass tags for metadata/context in the schema. If not, I am not sure how to proceed with the stance labeling.

akhdanfadh · 2024-03-30T10:33:34Z

#self-assign

holylovenia · 2024-04-01T05:39:57Z

I think this one will use general image_text seacrowd schema. My question is should I just implement the text OCR field and ignore the pillars? If so, then I can pass tags for metadata/context in the schema. If not, I am not sure how to proceed with the stance labeling.

I agree with you, @akhdanfadh.

The OCR subsets will use the [image_text](https://github.com/SEACrowd/seacrowd-datahub/blob/7bdfb4b461d6449b8200950938b13ef7614bc4f6/seacrowd/utils/schemas/image_text.py) schema. The pillars and tags info can simply be stored inside meta.
The topic classification subsets and the vision-language stance labeling subsets could use a new schema for image classification. We don't have one right now. Could you please make a separate PR for this new schema? According to our running name convention, the schema probably should be named image (though it sounds kind of weird...)

I hope this clears up things. What do you think?

akhdanfadh · 2024-04-01T07:31:26Z

The OCR subsets will use the image_text schema. The pillars and tags info can simply be stored inside meta.

The metadata on the schema is organized like this. The tags can be passed into context but I'm not sure about the pillars. Is it okay if I add another key in the metadata schema, for example, by using this code: feature['metadata']['stances'] = ...? See an implementation here for reference.

"metadata": {
    "context": datasets.Value("string"),
    "labels": datasets.Sequence(datasets.ClassLabel(names=label_names)),
}

The topic classification subsets and the vision-language stance labeling subsets could use a new schema for image classification. We don't have one right now. Could you please make a separate PR for this new schema? According to our running name convention, the schema probably should be named image (though it sounds kind of weird...)

Hmm, I don't mind actually. Though, implementing image classification schema would mean this SEACrowd project is not entirely NLP-oriented anymore haha 😄. I'll create a PR by this weekend.

holylovenia · 2024-04-01T07:34:52Z

Hmm, I don't mind actually. Though, implementing image classification schema would mean this SEACrowd project is not entirely NLP-oriented anymore haha 😄. I'll create a PR by this weekend.

Yes definitely. 👍 We're also consolidating every VL and speech and other datasets we can get our hands on.

Thanks @akhdanfadh!! Just let me know if you need anything.

* add image classification schema * add dataloader * change source feature, modify comment

SamuelCahyawijaya added this to SEACrowd Data Hub Jan 22, 2024

SamuelCahyawijaya converted this from a draft issue Jan 22, 2024

sabilmakbar added the help wanted Extra attention is needed label Jan 30, 2024

github-actions bot assigned TysonYu Feb 2, 2024

github-actions bot added the staled-issue label Feb 17, 2024

holylovenia added bonus +1 top-priority Needs to get done ASAP for the experiments labels Mar 12, 2024

github-actions bot removed the staled-issue label Mar 13, 2024

holylovenia unassigned TysonYu Mar 25, 2024

github-actions bot assigned akhdanfadh Mar 30, 2024

akhdanfadh added the question Further information is requested label Mar 30, 2024

akhdanfadh removed the question Further information is requested label Apr 1, 2024

akhdanfadh mentioned this issue Apr 2, 2024

Closes #355 | Add Dataloader TotalDefMeme #602

Merged

8 tasks

akhdanfadh added the pr-ready A PR that closes this issue is Ready to be reviewed label Apr 2, 2024

holylovenia removed the top-priority Needs to get done ASAP for the experiments label Apr 11, 2024

holylovenia closed this as completed in #602 May 1, 2024

holylovenia pushed a commit that referenced this issue May 1, 2024

Closes #355 | Add Dataloader TotalDefMeme (#602)

8a35006

* add image classification schema * add dataloader * change source feature, modify comment

github-project-automation bot moved this to Done in SEACrowd Data Hub May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create dataset loader for TotalDefMeme #355

Create dataset loader for TotalDefMeme #355

SamuelCahyawijaya commented Jan 22, 2024

TysonYu commented Feb 2, 2024

github-actions bot commented Feb 17, 2024

akhdanfadh commented Mar 30, 2024 •

edited

Loading

akhdanfadh commented Mar 30, 2024

holylovenia commented Apr 1, 2024 •

edited

Loading

akhdanfadh commented Apr 1, 2024

holylovenia commented Apr 1, 2024 •

edited

Loading

Create dataset loader for TotalDefMeme #355

Create dataset loader for TotalDefMeme #355

Comments

SamuelCahyawijaya commented Jan 22, 2024

TysonYu commented Feb 2, 2024

github-actions bot commented Feb 17, 2024

akhdanfadh commented Mar 30, 2024 • edited Loading

akhdanfadh commented Mar 30, 2024

holylovenia commented Apr 1, 2024 • edited Loading

akhdanfadh commented Apr 1, 2024

holylovenia commented Apr 1, 2024 • edited Loading

akhdanfadh commented Mar 30, 2024 •

edited

Loading

holylovenia commented Apr 1, 2024 •

edited

Loading

holylovenia commented Apr 1, 2024 •

edited

Loading