For texts written in contemporary Swedish Sparv can generate the following types of annotations:
-
Part of speech tagging:
pos
: part of speechmsd
: morphosyntactic tag
Tool: Hunpos
Model: in-house model trained on SUC 3.0
Tag set: MSD tags -
SALDO-based analysis:
baseform
: citation formlemgram
: lemgram, identifies the inflectional table (using SALDO tags)sense
: identifies a sense in SALDO and its probability- (
saldo
: identifies a sense in SALDO - will be removed soon) sentiment
: sentiment score using SenSALDO
-
Compound analysis (also based on SALDO):
complemgram
: compound lemgramcompwf
: compound word form- (
prefix
: initial part of a compound - will be removed soon) - (
suffix
: final part of a compound - will be removed soon)
-
Dependency analysis:
ref
: the position of the word in the sentencedephead
: dependency head, the ref of the word which the current word modifies or is dependent ofdeprel
: dependency relation, the relation of the current word to its dependency head
Tool: MaltParser
Model: swemalt, trained on Swedish Treebank
Tag set: Mamba-Dep -
Named entity recognition:
ne.ex
: named entity (name expression, numerical expression or time expression)ne.type
: named entity typene.subtype
: named entity sub type
Tool: hfst-SweNER
References: HFST-SweNER – A New NER Resource for Swedish, Reducing the effect of name explosion -
Readability metrics:
text.lix
: the Swedish readability metric LIX, läsbarhetsindextext.ovix
: the Swedish readability metric OVIX, ordvariationsindextext.nk
: the Swedish readability metric nominalkvot
-
Lexical classes:
blingbring
: lexical class from the Blingbring resource (on word level)swefn
: frames from swedish FrameNet (on word level)text.blingbring
: lexical class from the Blingbring resource (on document level)text.swefn
: frames from swedish FrameNet (on document level)
Older Swedish texts or texts written in other languages can often be annotated with a sub set of the above annotation types.
The msd
annotation for non-Swedish languages is based on different tag sets,
depending on which language is annotated and what annotation tool is being used.
The attribute contains information about the part of speech and in many cases
morphosyntactic information.
The pos
annotation contains only part-of-speech information and uses the
universal POS tag set.