-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ukrainian language support in Flair #2985
Comments
This is the code for the NER corpus I've used: and the code for the POS corpus: I'll take a look if I have fixed split for ner hosted somewhere else |
Really cool idea! I had to do a lot of manual preprocessing steps to get NER working when evaluating the ELECTRA model: https://github.com/stefan-it/ukrainian-electra/blob/main/download_prepare_data_ner.sh |
Oh, @stefan-it thanks for reminding me. Totally forgot about fixed split. On a separate topic. Would you like to try to train electra on a better quality ukrainian texts? |
Hey @dchaplinsky , I currently have access to TPUs, so if you have texts available I would love to pretrain another model 🤗 |
Yes I do! Could you contact me at chaplinsky[dot]dmitry on gmail? |
Hi @alanakbik and @stefan-it I've just uploaded two bigger models for the Ukrainian language: Those has hidden_size=2048 (in contrast to the 1024 of the original ones) and trained on my data + data from Stefan (54gb in total). I've also trained a downstream NER model on them, and got a nice 1.5% improvement over the previous one, will publish it shortly. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue tracks the progress of adding support for the Ukrainian language from lang-uk to Flair. We would like to add:
embeddings = FlairEmbeddings('uk-forward')
andembeddings = FlairEmbeddings('uk-backward')
tagger = SequenceTagger.load('ner-ukrainian')
tagger = SequenceTagger.load('pos-ukrainian')
corpus = NER_UKRAINIAN()
. Should be integrated only once version 2.0 is complete.corpus = UD_UKRAINIAN()
.The text was updated successfully, but these errors were encountered: