Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty features ud and catib #2

Open
mirkovogel opened this issue Sep 9, 2024 · 2 comments
Open

Empty features ud and catib #2

mirkovogel opened this issue Sep 9, 2024 · 2 comments

Comments

@mirkovogel
Copy link

The following observation concerns the LREC-Coling 2024 release (camel_morph/official_releases/lrec-coling2024_release/databases/camel-morph-msa):

The features catib6 and ud are always empty, e.g. in the following analysis of "فبسبب":

{
  'bw': 'فَ/CONJ+بِ/PREP+سَبَب/NOUN+ِ/CASE_DEF_GEN',
  'ud': '',
  'catib6': ''
}

The expected values are:

{
  'ud': 'CCONJ+ADP+NOUN	',
  'catib6': 'PRT+PRT+NOM'
}
@mirkovogel
Copy link
Author

Comment from @christios by mail:

As you've rightly pointed out, ud and catib are missing as we did not include those in the release (it was not our focus). But you are right they should be included in the next release. It should not be very difficult, probably just a mapping between the CAPHI POS (or Catib) and UD.

@mirkovogel
Copy link
Author

I am currently working on transitioning my pipeline to from the r13 morphological db to Camel Morph MSA, and need both catib6 and ud tags downstream, So I'd volunteer to help with this, if I can.

Maybe there already is code to convert between the "native" pos tags of the database (https://camel-tools.readthedocs.io/en/latest/reference/camel_morphology_features.html?) to other tag sets, I could use in the meantime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant