-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create dataset loader for SREDFM #48
Comments
This is interesting, I'll take this first and see whether this can be done under a week, else I might release the task to others. Btw, @SamuelCahyawijaya do we have a increasing bonus system after some time if the issue hasn't been picked up by anyone? |
#self-assign |
I'll start on this once #62 has been reviewed. Will start it by next week. |
Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help. |
Will try to get this done by EoW (since my other dataloader has been merged recently) |
Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help. |
Ugh, didn't have the chance to do this. will release this and see if anyone else can take this instead. |
#self-assign |
Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help. |
Hi, I did worked on this. The dataloader works, but apparently the test are hard to get passed, especially the seacrowd schema one. IIRC, I was having issues with IDs (duplicate). Currently have no time yet to fix it. |
Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help. |
Currently discussing it with @SamuelCahyawijaya |
Basically I got two problems
For second problem, I am still not sure why because:
Can look at the current full code here |
Hi @khelli07, are you still discussing it with @SamuelCahyawijaya or do you guys need another pair of eyes? |
I think @SamuelCahyawijaya is currently busy. Might need another help. |
For 1, I believe the text_offset was generated from the text field, but I saw on the offset text that a new line isn't present in the original text. Is it expected? For 2, when I rechecked the schema, it checked the IDs defined on the example level to ensure they were unique. In your implementation, you're defining the entity & relation ID only by specifying an iteration counter, which leads to duplication on the check. Perhaps the workaround will be similar to what you did on the passage ID, appending it to some unique identifier for distinguishing entity ID and relation ID. |
OK, will take a look into this again in near time. |
For number 2) it is resolved now. I think I misunderstood the concept Now, for problem number 1) -> yes, the problem is in the newline. Content-wise is the same |
The problem actually lies in the real dataset. The entities from the source datasets have "tidy" whitespaces (they don't have newlines). If you think about it, if the real passages are messy, the entity taken from it should also be messy. That being said, I suspect the source dataset does not actually take the entity from the passage cause the passage is a bit chaotic (in terms of whitespaces). |
Awesome, hahaha. Glad to hear that you've resolved the issue! 👍 |
* [New Feature] Add SREDFM dataloader (temp) * [Fix] Inequal string and unique id bug for SREDFM Dataloader * [Fix] Refactor based on reviews * [Fix] Remove redundant RE task in constants.py * [Fix] Implement reviews * [Fix] Implement review feedbacks
Dataloader name:
sredfm/sredfm.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?sredfm
The text was updated successfully, but these errors were encountered: