-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create dataset loader for TotalDefMeme #355
Comments
#self-assign |
Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help. |
My question is in the end. Based on the paper, they obtained a dataset of 7200 images. Then, they filtered it into 5301 memes with 2893 SG-related and 2408 non-SG. Among those SG-related memes, they annotated 2513 images. Quoting them,
So, for example, an image will have annotations like this: "Pillar_Stances": [
{
"img_4120.jpg": [
[
"Economic Defence",
[
"Neutral",
"Neutral"
]
],
[
"Psychological Defence",
[
"Against",
"Against"
]
]
]
},
...
]
"Tags": [
{
"img_4120.jpg": [
"Government",
"HDB",
"Gone",
"Lease End",
"Sad",
"Disappear",
"99-Years",
"e-scooter law"
]
},
...
] Furthermore, they also provide the text in almost all images (7012 to be exact) with some OCR algorithm as follows: "Text": [
{
"img_4120.jpg": "When a HDB flat finishes it's 99-year lease: This is so sad can we hit a pedestrian with escooters?"
},
{
"img_1712.jpg": "News; The mystery Chinese Virus can only spread through human interaction Engineering Students:"
},
...
] I think this one will use general image_text seacrowd schema. My question is should I just implement the |
#self-assign |
I agree with you, @akhdanfadh.
I hope this clears up things. What do you think? |
The "metadata": {
"context": datasets.Value("string"),
"labels": datasets.Sequence(datasets.ClassLabel(names=label_names)),
}
Hmm, I don't mind actually. Though, implementing image classification schema would mean this SEACrowd project is not entirely NLP-oriented anymore haha 😄. I'll create a PR by this weekend. |
Yes definitely. 👍 We're also consolidating every VL and speech and other datasets we can get our hands on. Thanks @akhdanfadh!! Just let me know if you need anything. |
Dataloader name:
total_defense_meme/total_defense_meme.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?total_defense_meme
The text was updated successfully, but these errors were encountered: