-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some dictionaries are not compatible #34
Comments
I went deeper with debugging on the above. This extension uses hunspell-spellchecker module (https://github.com/GitbookIO/hunspell-spellchecker). This module has few inherent problems. It loads a whole dictionary into the memory (into a associative table, a.k.a. dictionary, object to be precise). It uses the affixes (.aff file) to create ALL variants of the words found in the dictionary (.dic file) and then store them in the memory.
Ad. 2. When running with English dictionary ("en_US") that comes with the extension memory consumption is in peaks 500 MB and constantly above 250 MB. Far too much. There are other JavaScript implementations of spell checkers that use hunspell dictionaries (like Typo.js, https://github.com/cfinke/Typo.js/) but they share the same problem (see issues list about Portugese language). There are few nice JavaScript bindings to hunspell native binaries which use the dictionaries in more sophisticated way (like node-spellchecker, used by ATOM editor, https://github.com/atom/node-spellchecker) but they are native and are not supported in elegant way in VS Code. It seems that there is no good resolution for the above and so far VS Code cannot have a decent spell checker! P.S. The Spanish dictionary contained in the extension IS faulty in the sense described above. It does get parsed into the associative array but the HTML part also gets parsed so the dictionary used by the extension is full of HTML elements treated as words to spell check... |
Have you tried nspell? I made it to work with those dictionaries, and to my knowledge it’s the most complete JS-only spell checker for Node! |
Yes I did. I have described it in more details here: |
@bartosz-antosik Thanks so much for this info. I'm sorry that I never read it closely enough until today to see the problems with the Spanish dictionary. I just fixed it in #79. I'll be releasing a new version soon. I haven't caught up with microsoft/vscode#20266 completely, but there seems to be a lot of awesome conversation over there. When I originally wrote this extension, I hoped for a native solution to avoid the huge memory footprint of some languages, but shipping binaries was not easy at the time. I've been meaning to migrate this from hunspell to nspell for all of it's improvements and the language bundles, but I never made the time to do it. I started doing less document writing in my day-to-day so the current implementation worked good enough for me, and seems like many others. I'll try to look into the conversion again soon. There are a couple other open suggestions that nspell already has solutions for, e.g. add word to dictionary rather than ignoring words, that the hunspell library does not support. |
I have figured out the way to install other languages' dictionaries which was a clear waste of time, because you give a quite detailed explanation on this in one of the issues. Never mind.
This lead however to a conclusion that some dictionaries act weird.
Once you install Polish dictionary (rename it to pl_PL.aff & pl_PL.dic respectively & put to languages directory) and set "language" to "pl_PL" in workspace's spellchecker.json it displays after reload an Error message:
"Extension host terminated unexpectedly. Please reload the window to recover."
I tried debugging, but it acts strangely - the sub-window started to debug the extension disappears after some time and there seems to be no any information in debug Console. Setting up breakpoints does not work - seems like they do catch execution about extension's initialization and then they do not catch anything about the time the window disappears.
I presume the Polish dictionary files are OK, they come from the GitHub you mentioned (https://github.com/wooorm/dictionaries/) and they also worked fine with Sublime Text's spell checker extension which uses the same dictionary format.
I have done same test with French dictionary and it works fine.
Maybe you could have a look at this?
If there is anything I could do to help please let me know.
P.S. It is unrelated I think, but Spanish dictionary that comes with your extension seems to have some HTML atop each of three files (es_ANY.aff, es_ANY.dic & es_ANY README.txt). I have no idea whether it disturbs it's operation but seems strange compared to both en dictionaries (and Polish dictionary as well).
The text was updated successfully, but these errors were encountered: