module__YateaExtractor

Jump to bottom

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.yatea.YateaExtractor

Synopsis

Extract terms from the corpus using the YaTeA term extractor.

Description

org.bibliome.alvisnlp.modules.yatea.YateaExtractor hands the corpus to the YaTeA extractor. The corpus is first written in a file in the YaTeA input format. Tokens are annotations in the layer wordLayerName, their surface form, POS tag and lemma are taken from formFeature, posFeature and lemmaFeature features respectively. If sentenceLayerName is set, then an additional SENT marker is added to reinforce sentence boundaries corresponding to annotations in this layer.

The YaTeA is called using the executable set in yateaExecutable, it will run as if it is called from directory workingDir: the result will be written in the subdirectory named corpusName.

Parameters

rcFile

Optional

Type: SourceStream

Path to the YaTeA configuration file.

workingDir

Optional

Type: WorkingDirectory

Path to the directory where YaTeA is launched.

yateaExecutable

Optional

Type: ExecutableFile

Path to the YaTeA executable file.

configDir

Optional

Type: InputDirectory

language

Optional

Type: String

localeDir

Optional

Type: InputDirectory

outputDir

Optional

Type: OutputDirectory

perlLib

Optional

Type: String

Contents of the PERLLIB in the environment of Yatea binary.

postProcessingConfig

Optional

Type: InputFile

BioYaTeA option: path to the post-processing file option.

postProcessingOutput

Optional

Type: OutputFile

BioYaTeA option: path to the result file after post-processing.

suffix

Optional

Type: String

testifiedTerminology

Optional

Type: TestifiedTerminology

bioYatea

Default value: false

documentFilter

Default value: true

Type: Expression

Only process document that satisfy this filter.

documentTokens

Default value: true

Either to write DOCUMENT special tokens. Not every YaTeA version accepts them.

formFeature

Default value: form

Type: String

Feature containing the word form.

lemmaFeature

Default value: lemma

Type: String

Feature containing the word lemma.

posFeature

Default value: pos

Type: String

Feature containing the word POS tag.

sectionFilter

Default value: boolean:and(true, nav:layer:words())

Type: Expression

Process only sections that satisfy this filter.

sentenceLayerName

Default value: sentences

Type: String

Name of the layer containing sentence annotations, sentences are reinforced.

wordLayerName

Default value: words

Type: String

Name of the layer containing the word annotations.

yateaDefaultConfig

Default value: {}

yateaOptions

Default value: {}

AlvisNLP/ML Wiki

User guides

Developer guides

Clone this wiki locally