module__EnrichedDocumentWriter

#org.bibliome.alvisnlp.modules.EnrichedDocumentWriter

Synopsis

Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis.

Description

Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis.

Parameters

idMetaFeature

Optional

Type: String

Metadata key for the document id.

metaTrans

Optional

Type: Mapping

Metadata key translation.

neLayerName

Optional

Type: String

Name of the layer containing named entity annotations.

outDir

Optional

Type: OutputDirectory

Path to the directory where to write files.

outFilePrefix

Optional

Type: String

Prefix of the name of generated files.

termCanonicalFormFeature

Optional

Type: String

Name of the feature containing the term canonical form.

termLayerName

Optional

Type: String

Name of the layer containing the term annotations.

tokenLayerName

Optional

Type: String

Name of the layer containing token annotations.

tokenTypeFeature

Optional

Type: String

Name of the feature in token annotations containing the token type.

urlPrefix

Optional

Type: String

Prefix for the document URL.

semanticFeature

Optional

Type: String

Name of the feature containing semantic features of named entities and terms.

blockSize

Default value: 100

Type: Integer

Number of documents in each document block.

blockStart

Default value: 0

Type: Integer

Start point for document block numbering.

documentFilter

Default value: true

Type: Expression

Only process document that satisfy this filter.

lemmaFeature

Default value: lemma

Type: String

Name of the feature in word annotations containing the lemma.

neCanonicalFormFeature

Default value: lemma

Type: String

Name of the feature in named entity annotations containing the canonical form.

neTypeFeature

Default value: neType

Type: String

Name of the feature in named entity annotations containing the named entity type.

outFileSuffix

Default value: .sem

Type: String

Suffix of the name of generated files.

posFeature

Default value: pos

Type: String

Name of the feature in word annotations containing the POS tag.

sectionFilter

Default value: true

Type: Expression

Process only sections that satisfy this filter.

sentenceLayerName

Default value: sentences

Type: String

Name of the layer containing sentence annotations.

urlSuffixFeature

Default value: id

Type: String

Document feature to use as the URL suffix.

wordLayerName

Default value: words

Type: String

Name of the layer containing word annotations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

module__EnrichedDocumentWriter

Synopsis

Description

Parameters

idMetaFeature

metaTrans

neLayerName

outDir

outFilePrefix

termCanonicalFormFeature

termLayerName

tokenLayerName

tokenTypeFeature

urlPrefix

semanticFeature

blockSize

blockStart

documentFilter

lemmaFeature

neCanonicalFormFeature

neTypeFeature

outFileSuffix

posFeature

sectionFilter

sentenceLayerName

urlSuffixFeature

wordLayerName

AlvisNLP/ML Wiki

User guides

Developer guides

Clone this wiki locally