From 05fe3138dcb6f4760b9b8d2640e1bafc88d85eac Mon Sep 17 00:00:00 2001 From: Arne Binder Date: Fri, 24 Nov 2023 19:41:54 +0100 Subject: [PATCH] re-arrange and add TODOs --- dataset_builders/pie/argmicro/README.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/dataset_builders/pie/argmicro/README.md b/dataset_builders/pie/argmicro/README.md index 33cd8837..6f0063db 100644 --- a/dataset_builders/pie/argmicro/README.md +++ b/dataset_builders/pie/argmicro/README.md @@ -24,27 +24,28 @@ and the following annotation layers: - `stance` (annotation type: `Label`) - `edus` (annotation type: `Span`, target: `text`) - `adus` (tuple, annotation type: `LabeledAnnotationCollection`, target: `edus`) - - `annotations` (annotation type: `Span`, target: `text`) - - `label` (str, optional) + - description: TODO (*why* do we need to have this special annotation type? i.e. why `adus` with multiple `Span`?) + - `LabeledAnnotationCollection` has the following fields: + - `annotations` (annotation type: `Span`, target: `text`) + - `label` (str, optional), values: TODO - `relations` (annotation type: `MultiRelation`, target: `adus`) - - `head` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`) - - `tail` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`) - - `label` (str, optional) + - description: TODO (*why* do we need to have this special annotation type? i.e. why relations with multiple `head`'s and/or `tail`'s?) + - `MultiRelation` has the following fields: + - `head` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`) + - `tail` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`) + - `label` (str, optional), values: TODO See [here](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/annotations.py) for the annotation type definitions. -In addition to common annotation type definitions above, `ArgMicroDocument` contains special annotation types: `LabeledAnnotationCollection` and `MultiRelation`. -Both of which contain a tuple of one `Annotation` or more, as the document allows an `adus` with multiple `Span`'s, as well as a relation with multiple `head`'s and/or `tail`'s. - ## Document Converters The dataset provides document converters for the following target document types: - `pytorch_ie.documents.TextDocumentWithLabeledSpansAndBinaryRelations` - - `LabeledSpans`, converted from `ArgMicroDocument`'s `adus` + - `LabeledSpans`, converted from `ArgMicroDocument`'s `adus`, labels: TODO - if `adus` contains multiple spans, we take the start of the first `edu` and the end of the last `edu` as the boundaries of `LabeledSpan`. We also raise exceptions when there is an overlapping. - - `BinraryRelations`, converted from `ArgMicroDocument`'s `relations` - - if `relations` contains multiple `adus` as `head` and/or `tail`, then we build `BinaryRelations` between each `head`/`tail` to the other component. Then, we build `BinaryRelations` between each component that previously belongs to the same `LabeledAnnotationCollection`. + - `BinraryRelations`, converted from `ArgMicroDocument`'s `relations`, labels: TODO + - if `relations` contains multiple `adus` as `head` and/or `tail`, then we build `BinaryRelations` between each `head`/`tail` to the other component (TODO: with which label?). Then, we build `BinaryRelations` between each component that previously belongs to the same `LabeledAnnotationCollection` (TODO: with which label?). - `metadata`, we keep the `stance`, `topic_id`, and the rest of `ArgMicroDocument`'s `metadata`. See [here](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/documents.py) for the document type