Hi, Some type non-Western-character do not need keep space between words #346

napasa · 2018-06-13T09:41:47Z

Hi, Some type non-Western-character do not need keep space between words. for consider circumstance like this,we should add a mechanism to check whether we should puls word x coor calculated out with a extra spacing width .
for eaxmple.
English: Hello world!
Chinese:你好世界！

x += painter.getTextWidth(text + " ") / px2pu;
code snippet quote from src/hocr/HOCRPdfExporter.cc, line 717

manisandro · 2018-06-13T09:44:32Z

Any ideas how such a mechanism could look like?

napasa · 2018-06-13T09:45:04Z

I try it. ^_^.

napasa · 2018-06-14T08:11:59Z

https://src.chromium.org/viewvc/chrome/trunk/src/third_party/cld/languages/internal/
How about this, the CLD (Compact Language Detection) library.

manisandro · 2018-06-14T08:16:41Z

Well detecting the language is not that much of a problem per se, typically recognition language chosen by the user will match the actual text language. What needs to be defined are the rules when to add spaces and when not.

napasa · 2018-06-14T08:33:09Z

We need a blackboard to call on people to write that whether their language need to add spaces.

Sometimes，Especially in non western countries, we often encounter Western characters in our paper , so I suggest that when we draw each word, we can let users decide whether the language detection plug-in is opened.

manisandro · 2018-06-14T08:40:10Z

If you do a multilingual recognition, tesseract will detect the language and will write i in the lang attribute of the corresponding element of the hOCR document.

napasa · 2018-06-14T09:13:04Z

It seems not tell what language is when I use choose single Chinese recognition.
<span title="bbox 1434 2311 1566 2344; x_fsize 10; x_wconf 56" class="ocrx_word" id="word_1_117" lang="zh_CN">AAAN</span>

manisandro · 2018-06-21T22:13:53Z

Following up in the pull request #351

napasa · 2018-06-22T08:49:39Z

I reset my commit. so pls change pull request of following up to #353

manisandro closed this as completed Jun 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hi, Some type non-Western-character do not need keep space between words #346

Hi, Some type non-Western-character do not need keep space between words #346

napasa commented Jun 13, 2018 •

edited

Loading

manisandro commented Jun 13, 2018

napasa commented Jun 13, 2018

napasa commented Jun 14, 2018

manisandro commented Jun 14, 2018

napasa commented Jun 14, 2018 •

edited

Loading

manisandro commented Jun 14, 2018

napasa commented Jun 14, 2018 •

edited

Loading

manisandro commented Jun 21, 2018

napasa commented Jun 22, 2018

Hi, Some type non-Western-character do not need keep space between words #346

Hi, Some type non-Western-character do not need keep space between words #346

Comments

napasa commented Jun 13, 2018 • edited Loading

manisandro commented Jun 13, 2018

napasa commented Jun 13, 2018

napasa commented Jun 14, 2018

manisandro commented Jun 14, 2018

napasa commented Jun 14, 2018 • edited Loading

manisandro commented Jun 14, 2018

napasa commented Jun 14, 2018 • edited Loading

manisandro commented Jun 21, 2018

napasa commented Jun 22, 2018

napasa commented Jun 13, 2018 •

edited

Loading

napasa commented Jun 14, 2018 •

edited

Loading

napasa commented Jun 14, 2018 •

edited

Loading