-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks for LaBSE #10
Comments
Hey @loretoparisi, I have been off on vacation for a while. Will look into it as soon as I have a bit of time. Thanks for the suggestion! 😄 |
Thanks a lot, that would be very interesting also because recently a official blog post came out, pointing out a comparison among cross-lingual models (tatoeba dataset) Model 14 Langs 36 Langs 82 Langs All Langs The interesting part is about the support to "unsupported" languages "...For one third of these languages the LaBSE accuracy is higher than 75% and only 8 have accuracy lower than 25%, indicating very strong transfer performance to languages without training data" and - my opinion - minor/low resources languages. If I can help, let me know. Thank you! |
Hi @loretoparisi , Sorry for the late answer, it would be great to see the results on LaBSE. Unfortunately I do not have much capacity lately. Did you take a close look at it yourself? It should be fairly simple to replicate the propose work in this repo if LaBSE has been integrated to pytorch. Happy to connect on that. 😃 |
@MastafaF thank you anyways, I will have a look. |
It could be worth to add benchmarks for the new Language-agnostic BERT Sentence Embedding (LaBSE)
https://arxiv.org/pdf/2007.01852.pdf
The model is available already on tensorflow hub
https://tfhub.dev/google/LaBSE/1
The text was updated successfully, but these errors were encountered: