Skip to content

UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies

Notifications You must be signed in to change notification settings

LeonieWeissweiler/UCxn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies

Repository for the ongoing UCxn Project to add construction information to Universal Dependencies (UD), and code and dataset for "UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies" (Weissweiler et al., LREC-COLING 2024). It presents an approach to annotating constructions as a layer on top of UD ("UCxn"), with case studies exploring 5 typologically defined construction families in 10 languages, and Grew rules to automatically infer the construction annotations.

This paper can be found in the docs folder, alongside a full technical specification for the annotation scheme and construction subtypes in Version 1.

Grew queries described by the paper are found in the grew_rules folder, and the automatically annotated corpora in annotated_corpora. In addition, annotations can be explored in Grew-match, e.g. the UCxn-enhanced English-GUM corpus.

The UD 2.13 release served as the input for producing the rule-based annotations in annotated_corpora. (English-GUM contained some pilot construction annotations in the 2.13 release; those were removed before applying the Grew rules.) Individual treebanks are encouraged to incorporate UCxn annotations, ideally with manual checking, into official UD treebank repositories for future releases.

Example Annotation

ID FORM LEMMA UPOS ... UCxn
1 what what PRON ... CxnElt=2:Interrogative-WHInfo-Direct.WHWord
2 happened happen VERB ... Cxn=Interrogative-WHInfo-Direct|CxnElt=2:Interrogative-WHInfo-Direct.Clause
3 to to ADP ... _
4 you you PRON ... _
5 ? ? PUNCT ... _

Annotated Data Statistics

image

Languages

  • English
  • German
  • Swedish
  • French
  • Spanish
  • Portuguese
  • Hindi
  • Mandarin
  • Hebrew
  • Coptic

Constructions

  • Interrogative
  • Existential
  • Conditional
  • Resultative
  • NPN

Citation

If you use the dataset or code, please cite our paper:

@inproceedings{weissweiler-etal-2024-ucxn,
    title = "{UC}xn: Typologically Informed Annotation of Constructions Atop {U}niversal {D}ependencies",
    author = {Weissweiler, Leonie  and B{\"o}bel, Nina  and Guiller, Kirian  and Herrera, Santiago  and Scivetti, Wesley  and Lorenzi, Arthur  and Melnik, Nurit  and Bhatia, Archna  and Sch{\"u}tze, Hinrich  and Levin, Lori  and Zeldes, Amir  and Nivre, Joakim  and Croft, William  and Schneider, Nathan},
    editor = "Calzolari, Nicoletta  and Kan, Min-Yen  and Hoste, Veronique  and Lenci, Alessandro  and Sakti, Sakriani  and Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1471",
    pages = "16919--16932",
}

About

UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages