You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of our efforts to adapt Lingua for our production environment and requirements, we've been working on extending its language support. We believe these enhancements can also be beneficial for the wider Lingua community and would like to participate in mainstream development by contributing our changes.
Added Language Models
We have introduced models for the following languages:
Language
avg-low-ac
single-low-ac
pairs-low-ac
sent-low-ac
avg-high-ac
single-high-ac
pairs-high-ac
sent-high-ac
Amharic
100
100
100
100
100
100
100
100
Burmese
99
100
100
99
100
100
100
100
Chechen
83
77
85
86
86
86
88
86
Kyrgyz
54
37
37
89
58
45
41
89
Malayalam
100
100
100
100
100
100
100
100
Nepali
35
13
26
66
41
21
29
72
Pashto
79
63
76
97
89
7
92
99
Sanskrit
40
19
34
67
56
37
49
82
Sinhala
100
100
100
100
100
100
100
100
Sindhi
66
49
60
89
87
73
89
98
Tatar
43
21
29
80
47
26
34
80
Tajik
79
65
73
98
89
81
85
99
Turkmen
28
44
16
23
30
48
17
23
Uzbek
90
82
88
99
96
92
97
99
Lao
99
100
100
99
99
99
100
99
Khmer
100
100
100
100
100
100
100
100
Norwegian Language Model Consideration
Additionally, during our development, we identified the need to consolidate the Norwegian language models. Originally, Lingua supports both Bokmål and Nynorsk. However, for our specific use case, a singular Norwegian model proved to be more effective. Therefore, we've replaced Bokmål with a more general Norwegian model in our branch.
This change raises an important question for the Lingua project: Would there be interest in adding a unified Norwegian model alongside the existing Bokmål and Nynorsk models, or would you prefer maintaining the distinct form of Norwegian as currently represented by Bokmål and Nynorsk? We're open to reverting our Norwegian model to separate Bokmål and Nynorsk models to align with your preferences.
thank you for your effort to enhance my library with more languages. This is great. :) Can you please open a pull request? Then it's easier to review your changes and additions and to comment on them.
As for Norwegian, I prefer to treat Bokmal and Nynorsk separately because they are basically two different variants of written Norwegian. I want my library to be able to differentiate between them.
Hello! Thanks for your reply. I've opened the PR and removed general Norwegian from models (now there are two separate variants, as it was originally in your crate).
Hello,
Thank you for your great Lingua crate!
As part of our efforts to adapt Lingua for our production environment and requirements, we've been working on extending its language support. We believe these enhancements can also be beneficial for the wider Lingua community and would like to participate in mainstream development by contributing our changes.
Added Language Models
We have introduced models for the following languages:
Norwegian Language Model Consideration
Additionally, during our development, we identified the need to consolidate the Norwegian language models. Originally, Lingua supports both Bokmål and Nynorsk. However, for our specific use case, a singular Norwegian model proved to be more effective. Therefore, we've replaced Bokmål with a more general Norwegian model in our branch.
This change raises an important question for the Lingua project: Would there be interest in adding a unified Norwegian model alongside the existing Bokmål and Nynorsk models, or would you prefer maintaining the distinct form of Norwegian as currently represented by Bokmål and Nynorsk? We're open to reverting our Norwegian model to separate Bokmål and Nynorsk models to align with your preferences.
Here's the link to our branch: https://github.com/kareglazie/lingua-rs/tree/new-langs
The text was updated successfully, but these errors were encountered: