Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract consumption error #4057

Closed
ohmantics opened this issue Apr 21, 2023 · 1 comment
Closed

Tesseract consumption error #4057

ohmantics opened this issue Apr 21, 2023 · 1 comment

Comments

@ohmantics
Copy link

Current Behavior

Carrying over paperless-ngx/paperless-ngx#3142 to here. The linked PDF causes trouble.

'tesseract -l eng --psm 2 /tmp/ocrmypdf.io.mg5udwao/000003_rasterize.png stdout' returns 2.

Expected Behavior

No response

Suggested Fix

No response

tesseract -v

tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.9 : libjpeg 6b (libjpeg-turbo 2.0.6) : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.4.0
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.8

Operating System

No response

Other Operating System

Docker on Debian Bullseye on Proxmox

uname -a

Linux paperless 5.15.85-1-pve #1 SMP PVE 5.15.85-1 (2023-02-01T00:00Z) x86_64 GNU/Linux

Compiler

No response

CPU

No response

Virtualization / Containers

Docker version 23.0.1, build a5ee5b1 on top of an LXC container of Debian Bullseye on Proxmox 7.3-6.

Other Information

No response

@stweil
Copy link
Member

stweil commented Apr 22, 2023

Citing from the initial issue:

ohmantics: Seems like this should be reported to tesseract then?
stumpylog: Not likely. It's a problem with an image in the PDF. 40364 * 15220 is 614 megapixels, even without a color depth.

So you were already told what the problem is and that reporting an issue for tesseract is not a good idea.
In addition you are using an old version of tesseract.

There is already an issue for large images, see #3184.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants