diff --git a/dataset_builders/pie/argmicro/README.md b/dataset_builders/pie/argmicro/README.md index b1b281a0..d98944ee 100644 --- a/dataset_builders/pie/argmicro/README.md +++ b/dataset_builders/pie/argmicro/README.md @@ -74,8 +74,6 @@ input: path: pie/argmicro revision: 28ef031d2a2c97be7e9ed360e1a5b20bd55b57b2 name: en - base_dataset_kwargs: - data_dir: data/datasets/arg-microtexts-master.zip ``` For token based metrics, this uses `bert-base-uncased` from `transformer.AutoTokenizer` (see [AutoTokenizer](https://huggingface.co/docs/transformers/v4.37.1/en/model_doc/auto#transformers.AutoTokenizer), and [bert-based-uncased](https://huggingface.co/bert-base-uncased) to tokenize `text` in `TextDocumentWithLabeledSpansAndBinaryRelations` (see [document type](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/documents.py)). diff --git a/dataset_builders/pie/cdcp/README.md b/dataset_builders/pie/cdcp/README.md index c90e968c..b250c86a 100644 --- a/dataset_builders/pie/cdcp/README.md +++ b/dataset_builders/pie/cdcp/README.md @@ -52,8 +52,6 @@ input: _target_: pie_datasets.DatasetDict.load_dataset path: pie/cdcp revision: 001722894bdca6df6a472d0d186a3af103e392c5 - base_dataset_kwargs: - data_dir: data/datasets/cdcp_acl17.zip ``` For token based metrics, this uses `bert-base-uncased` from `transformer.AutoTokenizer` (see [AutoTokenizer](https://huggingface.co/docs/transformers/v4.37.1/en/model_doc/auto#transformers.AutoTokenizer), and [bert-based-uncased](https://huggingface.co/bert-base-uncased) to tokenize `text` in `TextDocumentWithLabeledSpansAndBinaryRelations` (see [document type](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/documents.py)). diff --git a/dataset_builders/pie/scidtb_argmin/README.md b/dataset_builders/pie/scidtb_argmin/README.md index d50718d8..50a42a29 100644 --- a/dataset_builders/pie/scidtb_argmin/README.md +++ b/dataset_builders/pie/scidtb_argmin/README.md @@ -46,8 +46,6 @@ input: _target_: pie_datasets.DatasetDict.load_dataset path: pie/scidtb_argmin revision: 335a8e6168919d7f204c6920eceb96745dbd161b - base_dataset_kwargs: - data_dir: data/datasets/scidtb_argmin_annotations.tgz ``` For token based metrics, this uses `bert-base-uncased` from `transformer.AutoTokenizer` (see [AutoTokenizer](https://huggingface.co/docs/transformers/v4.37.1/en/model_doc/auto#transformers.AutoTokenizer), and [bert-based-uncased](https://huggingface.co/bert-base-uncased) to tokenize `text` in `TextDocumentWithLabeledSpansAndBinaryRelations` (see [document type](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/documents.py)).