You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are 94 sentences in GUM found by the following Udapi query: cat *.conllu | udapy -TM util.Mark node='node.deprel=="conj" and node.parent.deprel in ("mark","case") and node.is_nonprojective()' | less -R
The conjunct depends non-projectively on a preposition (or subordinating conjunction), such as in the following example:
# sent_id = GUM_voyage_athens-2
...
│ ╭──────────────────────┮ of ADP case
│ ┢─╼ Classical ADJ amod │
┡─┶ Greece PROPN nmod │
│ │ ╭─╼ , PUNCT punct
│ │ ┢─╼ and CCONJ cc
│ │ ┢─╼ therefore ADV advmod
│ │ ┢─╼ of ADP case
│ │ ┢─╼ Western ADJ amod
│ ╰─┶ civilization NOUN conj
It seems that in all these cases the same preposition is repeated ("of X and of Y").
The restriction to non-projective constructions is needed to filer out phrases such as "before and after", which are parsed correctly.
These cases can be automatically fixed (after confirming my expectation that all are errors of this type) using udapy -s util.Eval node='if node.deprel=="conj" and node.parent.deprel in ("mark","case") and node.is_nonprojective(): node.parent = node.parent.parent' < old.conllu > fixed.conllu
The text was updated successfully, but these errors were encountered:
Yes, the pattern { X-[conj]->Y; X[upos=ADP]; Y[upos<>ADP] } Grew-match query is similar. It misses one case of "due/ADJ/case to/ADP/fixed" and it includes several projective cases like "at, or close/ADV/conj to", which are parsed correctly.
That said, now I see my query should also include the node.upos!="ADP" condition, so the result does not include "with medicine or without", which is parsed correctly, despite being non-projective.
Thanks for catching - by pure coincidence I ran into the same issue while doing consistency checks for the upcoming GUM8 data, so it will be repaired soon! I'm pretty sure this is an artefact from an uncaught conversion error when GUM switched from SD to UD about 4 years ago, so it probably only occurs in the older part of the corpus.
There are 94 sentences in GUM found by the following Udapi query:
cat *.conllu | udapy -TM util.Mark node='node.deprel=="conj" and node.parent.deprel in ("mark","case") and node.is_nonprojective()' | less -R
The conjunct depends non-projectively on a preposition (or subordinating conjunction), such as in the following example:
It seems that in all these cases the same preposition is repeated ("of X and of Y").
The restriction to non-projective constructions is needed to filer out phrases such as "before and after", which are parsed correctly.
These cases can be automatically fixed (after confirming my expectation that all are errors of this type) using
udapy -s util.Eval node='if node.deprel=="conj" and node.parent.deprel in ("mark","case") and node.is_nonprojective(): node.parent = node.parent.parent' < old.conllu > fixed.conllu
The text was updated successfully, but these errors were encountered: