-
Notifications
You must be signed in to change notification settings - Fork 6
module__org.bibliome.alvisnlp.modules.trie.OBOProjector
#org.bibliome.alvisnlp.modules.trie.OBOProjector
Projects OBO terms and synonyms on sections.
org.bibliome.alvisnlp.modules.trie.OBOProjector reads oboFiles in OBO format and searches for term names and synonyms in sections.
The parameters allowJoined, allUpperCaseInsensitive, caseInsensitive, ignoreDiacritics, joinDash, matchStartCaseInsensitive, skipConsecutiveWhitespaces, skipWhitespace and wordStartCaseInsensitive control the matching between the section and the entry keys.
The subject parameter specifies which text of the section should be matched. There are two options:
- the entries are matched on the contents of the section, subject can also control if matches boundaries coincide with word delimiters;
- the entries are matched on the feature value of annotations of a given layer separated by a whitespace, in this way entries can be searched against word lemmas for instance.
org.bibliome.alvisnlp.modules.trie.OBOProjector creates an annotation for each matched entry and adds these annotations to the layer named targetLayerName. The created annotations will have features nameFeature, idFeature and pathFeature set to the matched term name, identifier and path.
If specified, then org.bibliome.alvisnlp.modules.trie.OBOProjector assumes that trieSource contains a compiled version of the dictionary. dictFile is not read. If specified, org.bibliome.alvisnlp.modules.trie.OBOProjector writes a compiled version of the dictionary in trieSink. The use of compiled dictionaries may accelerate the processing for large dictionaries.
Optional
Type: String[]]
Path to the source OBO files.
Optional
Type: String
Name of the layer that contains the match annotations.
Optional
Type: String
Name of the feature that contains the term ancestors ids.
Optional
Type: String
Name of the feature that contains the term children ids.
Optional
Type: Mapping
Constant features to add to each annotation created by this module
Optional
Type: String
Feature where to store the matched term identifier.
Optional
Type: String
Feature where to store the matched term name.
Optional
Type: String
Name of the feature that contains the term parents ids.
Optional
Type: String
Feature where to store the matched term path.
Optional
Type: OutputFile
If set, org.bibliome.alvisnlp.modules.trie.OBOProjector writes the compiled dictionary to the specified file.
Optional
Type: InputFile
If set, read the compiled dictionary from the specified files. Compiled dictionaries are generally faster for large dictionaries.
Optional
Type: String
Name of the feature where to store the ontology version.
Default value: false
Type: Boolean
Either the match allows case substitution on all characters in words that are all upper case.
Default value: false
Type: Boolean
Either the match allows arbitrary suppression of whitespace characters in the subject. For instance, the contents aminoacid matches the entry amino acid.
Default value: false
Type: Boolean
Either the match allows case substitutions on all characters.
Default value: true
Type: Expression
Only process document that satisfy this filter.
Default value: false
Type: Boolean
Either the match allows dicacritics substitutions on all characters. For instance the contents acide amine matches the entry acide aminé.
Default value: false
Type: Boolean
Either to treat dash characters (-) as whitespace characters if allowJoined is true
. For instance, the contents aminoacid matches the entry amino-acid.
Default value: false
Type: Boolean
Add all database cross-references of the term. org.bibliome.alvisnlp.modules.trie.OBOProjector creates a feature key-value pair for each dbxref in the matching term.
Default value: false
Type: Boolean
Either the match allows case substitution on the first character of the entry key.
Default value: all
Type: MultipleEntryBehaviour
Specifies the behavious of org.bibliome.alvisnlp.modules.trie.OBOProjector if dictFile contains several entries with the same key.
Default value: true
Type: Expression
Process only sections that satisfy this filter.
Default value: false
Type: Boolean
Either the match allows insertion of consecutive whitespace characters in the subject. For instance, the contents amino acid matches the entry amino acid.
Default value: false
Type: Boolean
Either the match allows arbitrary insertion of whitespace characters in the subject. For instance, the contents amino acid matches the entry aminoacid.
Default value: WORD
Type: Subject
Specifies the contents to match.
Default value: false
Type: Boolean
Either the match allows case substitution on the first character of words.