Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conjunct wrongly attached to prepositions #41

Open
martinpopel opened this issue Dec 28, 2021 · 3 comments
Open

Conjunct wrongly attached to prepositions #41

martinpopel opened this issue Dec 28, 2021 · 3 comments

Comments

@martinpopel
Copy link
Member

There are 94 sentences in GUM found by the following Udapi query:
cat *.conllu | udapy -TM util.Mark node='node.deprel=="conj" and node.parent.deprel in ("mark","case") and node.is_nonprojective()' | less -R

The conjunct depends non-projectively on a preposition (or subordinating conjunction), such as in the following example:

# sent_id = GUM_voyage_athens-2
...
   │ ╭──────────────────────┮ of ADP case
   │ ┢─╼ Classical ADJ amod │
   ┡─┶ Greece PROPN nmod    │
   │                        │ ╭─╼ , PUNCT punct
   │                        │ ┢─╼ and CCONJ cc
   │                        │ ┢─╼ therefore ADV advmod
   │                        │ ┢─╼ of ADP case
   │                        │ ┢─╼ Western ADJ amod
   │                        ╰─┶ civilization NOUN conj

It seems that in all these cases the same preposition is repeated ("of X and of Y").
The restriction to non-projective constructions is needed to filer out phrases such as "before and after", which are parsed correctly.

These cases can be automatically fixed (after confirming my expectation that all are errors of this type) using
udapy -s util.Eval node='if node.deprel=="conj" and node.parent.deprel in ("mark","case") and node.is_nonprojective(): node.parent = node.parent.parent' < old.conllu > fixed.conllu

@nschneid
Copy link

Not exactly the same query but I think it is similar: coordination between an ADP and a non-ADP

@martinpopel
Copy link
Member Author

Yes, the pattern { X-[conj]->Y; X[upos=ADP]; Y[upos<>ADP] } Grew-match query is similar. It misses one case of "due/ADJ/case to/ADP/fixed" and it includes several projective cases like "at, or close/ADV/conj to", which are parsed correctly.

That said, now I see my query should also include the node.upos!="ADP" condition, so the result does not include "with medicine or without", which is parsed correctly, despite being non-projective.

@amir-zeldes
Copy link
Contributor

Thanks for catching - by pure coincidence I ran into the same issue while doing consistency checks for the upcoming GUM8 data, so it will be repaired soon! I'm pretty sure this is an artefact from an uncaught conversion error when GUM switched from SD to UD about 4 years ago, so it probably only occurs in the older part of the corpus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants