-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, Some type non-Western-character do not need keep space between words #346
Comments
Any ideas how such a mechanism could look like? |
I try it. ^_^. |
https://src.chromium.org/viewvc/chrome/trunk/src/third_party/cld/languages/internal/ |
Well detecting the language is not that much of a problem per se, typically recognition language chosen by the user will match the actual text language. What needs to be defined are the rules when to add spaces and when not. |
We need a blackboard to call on people to write that whether their language need to add spaces. Sometimes,Especially in non western countries, we often encounter Western characters in our paper , so I suggest that when we draw each word, we can let users decide whether the language detection plug-in is opened. |
If you do a multilingual recognition, tesseract will detect the language and will write i in the lang attribute of the corresponding element of the hOCR document. |
It seems not tell what language is when I use choose single Chinese recognition. |
Following up in the pull request #351 |
I reset my commit. so pls change pull request of following up to #353 |
Hi, Some type non-Western-character do not need keep space between words. for consider circumstance like this,we should add a mechanism to check whether we should puls word x coor calculated out with a extra spacing width .
for eaxmple.
English: Hello world!
Chinese:你好世界!
x += painter.getTextWidth(text + " ") / px2pu;
code snippet quote from src/hocr/HOCRPdfExporter.cc, line 717
The text was updated successfully, but these errors were encountered: