Switch gender annotation over to reference separate lexemes in a loop #401
Labels
blocked
Another issue is blocking
feature
New feature or request
help wanted
Extra attention is needed
Terms
Description
Scribe will be switching over its data process to be more directly based on one lexemes per data entry. At this time we combine lexemes together based on the individual strings, so in German the word
Schild
meanssign
andshield
, but is one entry for us. In order to simplify the data formatting process, we'll need to remove this, which further means that the way we store genders will be different.The current way is that if a string has multiple genders, then we'll store each of them separated by slashed, so
F/M/N/C/PL
and all the variants. We'll soon have a situation where we'll have one entry for every lexeme and their plural. What this means is that rather then checking to see if the string has a dash in it and then separating it, we'll need to get the gender and check to see if the string/lexeme occurs more time and then append those genders.Contribution
Happy to discuss the work for this and help with implementation or work on it myself at some point!
The text was updated successfully, but these errors were encountered: