-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sciarg
dataset
#61
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #61 +/- ##
==========================================
+ Coverage 94.57% 94.72% +0.14%
==========================================
Files 18 19 +1
Lines 1272 1289 +17
==========================================
+ Hits 1203 1221 +18
+ Misses 69 68 -1 ☔ View full report in Codecov by Sentry. |
687d275
to
4d0530c
Compare
bd2aef7
to
b8ecdea
Compare
f76c85b
to
4cd40ce
Compare
3039049
to
76fda93
Compare
Regarding label counts, there are discrepancies between
Possibly caused by the difference in label assignment during the document extraction and/or conversion processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset cards got a bit mixed up, see below. In short: we do not have a HF sciarg
dataset, so we can not have a HF dataset card for it.
EDIT: Please also put the content of your above comment to a fitting location in the PIE dataset card.
433b6fa
to
9a8e47d
Compare
eeb8185
to
08b50d6
Compare
In this PR, we add SciArg - an argumentation-mining dataset, according to the description in #10.
Note: This requires
pie-dataset>=0.4.0
(in detail: #57, #58, #59, #60).TODO:
maybe re-thinkadhust howparts_of_same
relations and dataset variantname=merge_fragmented_spans
play together @ArneBinderdefault
one, but with parametermerge_fragmented_spans=true
(and setDOCUMENT_CONVERTERS
instead of overwritingdocument_converters
)[ ] add entry forwill be added in a follow-up PR...With LabeledMultiSpan...
toDOCUMENT_CONVERTERS