You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vertical writing systems can be OCRed (fairly) reliably with the tesseract command-line tool, but will get garbled characters with gImageReader by default. Horizontal writing systems are not affected.
Here are some sample images (in chi_sim, jpn, chi_sim_vert, jpn_vert respectively):
Here are the results using tesseract:
(縦組み is not OCRed correctly, but that is not a big problem.)
Here is the result using gImageReader (taking jpn_vert as an example):
I noticed that after rotating the image 90° counterclockwise, the result will be correct:
(and 縦組み is OCRed correctly!)
The issue has been reported in Issue #552, but it is mistakenly regarded as a bug in tessdata. Since the tesseract command-line tool can handle it correctly, it is definitely gImageReader's fault.
I'm using gImageReader 3.4.2 and tesseract 5.4.1 under Arch Linux, using the default tessdata provided by tesseract. I noticed that gImageReader says it is using tesseract 5.3.4 in the "About" dialog, so this might have something to do with the problem.
The text was updated successfully, but these errors were encountered:
Vertical writing systems can be OCRed (fairly) reliably with the tesseract command-line tool, but will get garbled characters with gImageReader by default. Horizontal writing systems are not affected.
Here are some sample images (in
chi_sim
,jpn
,chi_sim_vert
,jpn_vert
respectively):Here are the results using tesseract:
(縦組み is not OCRed correctly, but that is not a big problem.)
Here is the result using gImageReader (taking
jpn_vert
as an example):I noticed that after rotating the image 90° counterclockwise, the result will be correct:
(and 縦組み is OCRed correctly!)
The issue has been reported in Issue #552, but it is mistakenly regarded as a bug in tessdata. Since the tesseract command-line tool can handle it correctly, it is definitely gImageReader's fault.
I'm using gImageReader 3.4.2 and tesseract 5.4.1 under Arch Linux, using the default tessdata provided by tesseract. I noticed that gImageReader says it is using tesseract 5.3.4 in the "About" dialog, so this might have something to do with the problem.
The text was updated successfully, but these errors were encountered: