-
Notifications
You must be signed in to change notification settings - Fork 6
AlvisNLP ML data model
The data structure contains corpus contents and annotation. The data structure is passed from a module to the next one. Each module instance can access (read and write) it through a shared object.
The following figure presents an UML-like specification of the AlvisNLP/ML data structure.
-
Corpus: a
Corpus
object represents a collection of documents. In an AlvisNLP/ML run, the corpus is a unique object passed from module to module. ACorpus
object has features and documents. -
Document: a
Document
object represents a single document. Each document has an identifier which is unique in the corpus. ADocument
object has features and sections. -
Section: a
Section
object contains a piece of the document's text contents. Each section has a name, a contents, features, layers, and relations. -
Layer: a
Layer
object is an annotation container. ALayer
object has a name unique in the section. -
Annotation: an
Annotation
object represents a span of text created by a module. Each annotation is included in at least one layer. AnAnnotation
object has a start and end which represent the coordinates of the annotation in the section's contents, and features. -
Relation: a
Relation
object is a tuple container. ARelation
object has a name unique in the section and features. -
Tuple: a
Tuple
object represents a relation between several elements in the data structure. ATuple
object has several arguments, each argument is an element (Corpus
,Document
,Section
,Relation
, but most oftenAnnotation
orTuple
) accessible through a role name. ATuple
object also has features. -
Features are key-value pairs that contain information on an element type, tag or property. Feature keys are not unique in an element, though when accessing a feature key, the last value is returned.