-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create dataset loader for ALT Burmese Treebank #16
Comments
#self-assign |
Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help. |
Hi @gagan3012, may I know the current status of this dataloader creation? Feel free to discuss in here if you have any difficulties, thx! |
Hi @gagan3012, we received no response from you regarding this dataloader, so I will remove your assignment. Anyone interested in taking this dataloader, please feel free to |
#self-assign |
#self-assign |
Can you please advise, which schema I need to use for this dataset? Here is example from dataset: SNT.42638.98 (ROOT (NOUN (NOUN (VERB (verb ပူးပေါင်း) (part ဖို့) ) (NOUN (NOUN (noun (noun ပါတီ) (part တွေ) ) (adp ရဲ့) ) (noun (verb ငြင်းဆို) (part မှု) ) ) ) (adp ကို) ) (VERB (VERB (VERB (punct ") (noun (verb စိတ်ပျက်) (part စရာ) ) (punct ") ) (part လို့) ) (VERB (NOUN (noun (noun ပီတာ) (noun ဟိန်း) ) (adp က) ) (verb (verb ခေါ်) (verb ဆို) (part ပါ) (part တယ်) ) ) ) (punct ။) ) |
Hi @MJonibek!! Nice to e-meet you again~ Could you please take a look at the kb schema and let me know what you think? |
Hi @holylovenia, nice to meet you too :) Regarding kb schema, I am not sure it is possible to transform such data to this format. Maybe we can somehow use "entities" for the lowest level (like noun, part, punct), but I am not sure how to represent other levels of the tree (like VERB, NOUN, ROOT). Maybe we need to use a schema like this: |
Hi @holylovenia, can you please comment on the proposed schema? If this is ok, I will create this dataloader using this schema. |
Hi @MJonibek! Sorry for the late reply. I've discussed this with @SamuelCahyawijaya and this schema looks great to us! Could you please make a PR for this |
Great, will try to do it till the end of this week |
Related #16 | Add Tree schema and CONSTITUENCY_PARSING task
Closes #16 | Create dataset loader for ALT Burmese Treebank
Dataloader name:
alt_burmese_treebank/alt_burmese_treebank.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?alt_burmese_treebank
The text was updated successfully, but these errors were encountered: