Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertical writing systems are not handled correctly in gImageReader #683

Open
lhy7889678 opened this issue Sep 9, 2024 · 0 comments
Open

Comments

@lhy7889678
Copy link

Vertical writing systems can be OCRed (fairly) reliably with the tesseract command-line tool, but will get garbled characters with gImageReader by default. Horizontal writing systems are not affected.

Here are some sample images (in chi_sim, jpn, chi_sim_vert, jpn_vert respectively):

chi_sim
jpn
chi_sim_vert
jpn_vert

Here are the results using tesseract:

tesseract

(縦組み is not OCRed correctly, but that is not a big problem.)

Here is the result using gImageReader (taking jpn_vert as an example):

gimagereader

I noticed that after rotating the image 90° counterclockwise, the result will be correct:

gimagereader_rot

(and 縦組み is OCRed correctly!)

The issue has been reported in Issue #552, but it is mistakenly regarded as a bug in tessdata. Since the tesseract command-line tool can handle it correctly, it is definitely gImageReader's fault.

I'm using gImageReader 3.4.2 and tesseract 5.4.1 under Arch Linux, using the default tessdata provided by tesseract. I noticed that gImageReader says it is using tesseract 5.3.4 in the "About" dialog, so this might have something to do with the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant