-
Notifications
You must be signed in to change notification settings - Fork 0
POS priority list
Priority list is a list which represents dominating of one POS over another. That means that even though some token can be interpreted as a different part of speech depends on context some POS occurs more frequently.
Of course, these lists can break rare contexts, but that is the cost you pay.
When you run your recognizer and it supports priority list, you can pass it with the following manner:
[
{
"__what": {
"xpos": "A",
"upos": "B"
},
"__replace": {
"xpos": "C",
"upos": "D"
}
}
]
That means that every token with A
XPOS and B
UPOS occurring in the text will be modified with given property-value pairs.
MorhologyRecognizer
class described in libs/morphology.py
supports this technique, but with some restrictions:
- Only
xpos
andupos
properties of the token can be changed. - Modifying will be made only when both of "before modifying" and "after modifying" xpos-upos pairs exist in db response. That guarantee that only tokens which really can be both of POS will be modified.
Here the good list of priority for ukrainian. It will supply in the future.
[
{
"__what": {
"xpos": "Q"
},
"__replace": {
"xpos": "Ccs",
"upos": "CCONJ"
}
},
{
"__what": {
"xpos": "Y"
},
"__replace": {
"xpos": "Spsl",
"upos": "ADP"
}
},
{
"__what": {
"xpos": "Q"
},
"__replace": {
"xpos": "Spsl",
"upos": "ADP"
}
},
{
"__what": {
"xpos": "Ncmpan"
},
"__replace": {
"xpos": "Vmen",
"upos": "VERB"
}
},
{
"__what": {
"xpos": "Ncmpan"
},
"__replace": {
"xpos": "Vmpn",
"upos": "VERB"
}
},
{
"__what": {
"xpos": "Nc-piy"
},
"__replace": {
"xpos": "Ncmpin",
"upos": "NOUN"
}
},
{
"__what": {
"xpos": "Spsi"
},
"__replace": {
"xpos": "Spsa",
"upos": "ADP"
}
},
{
"__what": {
"xpos": "Pd--m-sga"
},
"__replace": {
"upos": "Pd--nnsgn",
"xpos": "PRON"
}
},
{
"__what": {
"xpos": "Ncmpan"
},
"__replace": {
"upos": "Vmpn",
"xpos": "VERB"
}
},
{
"__what": {
"xpos": "Ncmsan"
},
"__replace": {
"upos": "Ncmsgn",
"xpos": "NOUN"
}
}
]
Here's explanation on some tags:
Change this | To this | Examples (list) | Description |
---|---|---|---|
Q | CCONJ Ccs |
та , і
|
These words appear to be particles in some context, but more often they are conjunctions. |
Y | ADP Spsl | у |
That can be letter of abbreviation, but that is also adpositions. |
Q | ADP Spsl | на |
You can declare priority list using this techique:
if
token
xpos is ...
upos is ...
then
xpos becomes ...
upos becomes ...
No syntactic sugar for now.