From 61a962b48f76df1e53fa7b456b93f2bf5ca0fe26 Mon Sep 17 00:00:00 2001 From: Arne Binder Date: Thu, 2 Nov 2023 20:00:21 +0100 Subject: [PATCH] add README.md --- dataset_builders/pie/conll2003/README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 dataset_builders/pie/conll2003/README.md diff --git a/dataset_builders/pie/conll2003/README.md b/dataset_builders/pie/conll2003/README.md new file mode 100644 index 00000000..c8b5c4c1 --- /dev/null +++ b/dataset_builders/pie/conll2003/README.md @@ -0,0 +1,19 @@ +# PIE Dataset Card for "conll2003" + +This is a [PyTorch-IE](https://github.com/ChristophAlt/pytorch-ie) wrapper for the +[CoNLL 2003 Huggingface dataset loading script](https://huggingface.co/datasets/conll2003). + +## Data Schema + +The document type for this dataset is `CoNLL2003Document` which defines the following data fields: + +- `text` (str) +- `id` (str, optional) +- `metadata` (dictionary, optional) + +and the following annotation layers: + +- `entities` (annotation type: `LabeledSpan`, target: `text`) + +See [here](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/annotations.py) for the definitions of +the annotation types.