-
Notifications
You must be signed in to change notification settings - Fork 6
module__EnrichedDocumentWriter
#org.bibliome.alvisnlp.modules.EnrichedDocumentWriter
Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis.
Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis.
Optional
Type: String
Metadata key for the document id.
Optional
Type: Mapping
Metadata key translation.
Optional
Type: String
Name of the layer containing named entity annotations.
Optional
Type: OutputDirectory
Path to the directory where to write files.
Optional
Type: String
Prefix of the name of generated files.
Optional
Type: String
Name of the feature containing the term canonical form.
Optional
Type: String
Name of the layer containing the term annotations.
Optional
Type: String
Name of the layer containing token annotations.
Optional
Type: String
Name of the feature in token annotations containing the token type.
Optional
Type: String
Prefix for the document URL.
Optional
Type: String
Name of the feature containing semantic features of named entities and terms.
Default value: 100
Type: Integer
Number of documents in each document block.
Default value: 0
Type: Integer
Start point for document block numbering.
Default value: true
Type: Expression
Only process document that satisfy this filter.
Default value: lemma
Type: String
Name of the feature in word annotations containing the lemma.
Default value: lemma
Type: String
Name of the feature in named entity annotations containing the canonical form.
Default value: neType
Type: String
Name of the feature in named entity annotations containing the named entity type.
Default value: .sem
Type: String
Suffix of the name of generated files.
Default value: pos
Type: String
Name of the feature in word annotations containing the POS tag.
Default value: true
Type: Expression
Process only sections that satisfy this filter.
Default value: sentences
Type: String
Name of the layer containing sentence annotations.
Default value: id
Type: String
Document feature to use as the URL suffix.
Default value: words
Type: String
Name of the layer containing word annotations.