Skip to content

Commit

Permalink
re-arrange and add TODOs
Browse files Browse the repository at this point in the history
  • Loading branch information
ArneBinder committed Nov 26, 2023
1 parent a4555b4 commit 05fe313
Showing 1 changed file with 12 additions and 11 deletions.
23 changes: 12 additions & 11 deletions dataset_builders/pie/argmicro/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,27 +24,28 @@ and the following annotation layers:
- `stance` (annotation type: `Label`)
- `edus` (annotation type: `Span`, target: `text`)
- `adus` (tuple, annotation type: `LabeledAnnotationCollection`, target: `edus`)
- `annotations` (annotation type: `Span`, target: `text`)
- `label` (str, optional)
- description: TODO (*why* do we need to have this special annotation type? i.e. why `adus` with multiple `Span`?)
- `LabeledAnnotationCollection` has the following fields:
- `annotations` (annotation type: `Span`, target: `text`)
- `label` (str, optional), values: TODO
- `relations` (annotation type: `MultiRelation`, target: `adus`)
- `head` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`)
- `tail` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`)
- `label` (str, optional)
- description: TODO (*why* do we need to have this special annotation type? i.e. why relations with multiple `head`'s and/or `tail`'s?)
- `MultiRelation` has the following fields:
- `head` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`)
- `tail` (tuple, annotation type: `LabeledAnnotationCollection`, target: `adus`)
- `label` (str, optional), values: TODO

See [here](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/annotations.py) for the annotation type definitions.

In addition to common annotation type definitions above, `ArgMicroDocument` contains special annotation types: `LabeledAnnotationCollection` and `MultiRelation`.
Both of which contain a tuple of one `Annotation` or more, as the document allows an `adus` with multiple `Span`'s, as well as a relation with multiple `head`'s and/or `tail`'s.

## Document Converters

The dataset provides document converters for the following target document types:

- `pytorch_ie.documents.TextDocumentWithLabeledSpansAndBinaryRelations`
- `LabeledSpans`, converted from `ArgMicroDocument`'s `adus`
- `LabeledSpans`, converted from `ArgMicroDocument`'s `adus`, labels: TODO
- if `adus` contains multiple spans, we take the start of the first `edu` and the end of the last `edu` as the boundaries of `LabeledSpan`. We also raise exceptions when there is an overlapping.
- `BinraryRelations`, converted from `ArgMicroDocument`'s `relations`
- if `relations` contains multiple `adus` as `head` and/or `tail`, then we build `BinaryRelations` between each `head`/`tail` to the other component. Then, we build `BinaryRelations` between each component that previously belongs to the same `LabeledAnnotationCollection`.
- `BinraryRelations`, converted from `ArgMicroDocument`'s `relations`, labels: TODO
- if `relations` contains multiple `adus` as `head` and/or `tail`, then we build `BinaryRelations` between each `head`/`tail` to the other component (TODO: with which label?). Then, we build `BinaryRelations` between each component that previously belongs to the same `LabeledAnnotationCollection` (TODO: with which label?).
- `metadata`, we keep the `stance`, `topic_id`, and the rest of `ArgMicroDocument`'s `metadata`.

See [here](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/documents.py) for the document type
Expand Down

0 comments on commit 05fe313

Please sign in to comment.