Vertical writing systems are not handled correctly in gImageReader #683

lhy7889678 · 2024-09-09T07:50:29Z

Vertical writing systems can be OCRed (fairly) reliably with the tesseract command-line tool, but will get garbled characters with gImageReader by default. Horizontal writing systems are not affected.

Here are some sample images (in chi_sim, jpn, chi_sim_vert, jpn_vert respectively):

Here are the results using tesseract:

(縦組み is not OCRed correctly, but that is not a big problem.)

Here is the result using gImageReader (taking jpn_vert as an example):

I noticed that after rotating the image 90° counterclockwise, the result will be correct:

(and 縦組み is OCRed correctly!)

The issue has been reported in Issue #552, but it is mistakenly regarded as a bug in tessdata. Since the tesseract command-line tool can handle it correctly, it is definitely gImageReader's fault.

I'm using gImageReader 3.4.2 and tesseract 5.4.1 under Arch Linux, using the default tessdata provided by tesseract. I noticed that gImageReader says it is using tesseract 5.3.4 in the "About" dialog, so this might have something to do with the problem.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertical writing systems are not handled correctly in gImageReader #683

Vertical writing systems are not handled correctly in gImageReader #683

lhy7889678 commented Sep 9, 2024

Vertical writing systems are not handled correctly in gImageReader #683

Vertical writing systems are not handled correctly in gImageReader #683

Comments

lhy7889678 commented Sep 9, 2024