Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add other languages #7

Open
kbenoit opened this issue Aug 25, 2020 · 1 comment
Open

Add other languages #7

kbenoit opened this issue Aug 25, 2020 · 1 comment

Comments

@kbenoit
Copy link
Collaborator

kbenoit commented Aug 25, 2020

Right now, nsyllable() works only with English, using the CMU syllable dictionary. There are other sources for syllables, however, and we could consider splitting this into a separate package (like we did with stopwords) to create a lookup function for words in other languages.

and there are more in https://en.wiktionary.org/wiki/Category:All_languages

@kbenoit kbenoit transferred this issue from quanteda/quanteda Nov 11, 2020
@kbenoit kbenoit changed the title Add expanded syllable support Add other languages Nov 11, 2020
@kbenoit
Copy link
Collaborator Author

kbenoit commented Nov 12, 2020

The Wikipedia wordlists appear to be fairly incomplete, based apparently on words that have Wikipedia entries. We could harvest them into dictionaries for each language but many words would be guessed. This would still be better than 100% guessing, but even better would be more comprehensive word lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant