-
Notifications
You must be signed in to change notification settings - Fork 6
module__SeSMig
#org.bibliome.alvisnlp.modules.segmig.SeSMig
Detects sentence boundaries and creates one annotation for each sentence.
This module assumes WoSMig processed the same sections.
org.bibliome.alvisnlp.modules.segmig.SeSMig scans for annotations in wordLayerName and detects a sentence boundaries defined as either:
- an annotation whose feature eosStatusFeature equals eos;
- an annotation whose surface form contains only characaters of the value of strongPunctuations and which is followed by an uppercase character;
- an annotation whose feature eosStatusFeature equals maybe-eos and which is followed by an uppercase character.
org.bibliome.alvisnlp.modules.segmig.SeSMig creates an annotation for each sentence and adds it into the targetLayerName. The eosStatusFeature of word annotations are given a new value:
- eos: for the last word of each sentence;
- not-eos: for all other words.
If noBreakLayerName is defined, then org.bibliome.alvisnlp.modules.segmig.SeSMig will prevent sentence boundaries inside annotations in this layer.
Optional
Type: Mapping
Constant features to add to each annotation created by this module
Optional
Type: String
Name of the layer containing annotations within which there cannot be sentence boundaries.
Default value: true
Type: Expression
Only process document that satisfy this filter.
Default value: eos
Type: String
Name of the feature (in words) containing the end-of-sentence status (not-eos, maybe-eos).
Default value: form
Type: String
Name of the feature containing the word surface form.
Default value: boolean:and(true, nav:layer:words())
Type: Expression
Process only sections that satisfy this filter.
Default value: ?.!
Type: String
List of strong punctuations.
Default value: sentences
Type: String
Name of the layer where to store sentence annotations.
Default value: wordType
Type: String
Name of the feature where to read word annotation type.
Default value: words
Type: String
Name of the layer containing word annotations.