You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Too large images cause tesseract on our heroku infrastructure to crash by using too much RAM.
Many of these images are 600 dpi. What if we downsampled to 300 dpi first before OCR? That would prob very many of them within limits again. Heck, even if original is 300 dpi, a 150 dpi is probably still OCR'able.
Based on experience (See #2820), images start causing problems at around 200MB or around 70 million pixels (height x width > 70 million).
The pixel boundary is probably more reliable. If more than 70 million pixels, downsample by 50% before OCR'ing?
The text was updated successfully, but these errors were encountered:
Too large images cause tesseract on our heroku infrastructure to crash by using too much RAM.
Many of these images are 600 dpi. What if we downsampled to 300 dpi first before OCR? That would prob very many of them within limits again. Heck, even if original is 300 dpi, a 150 dpi is probably still OCR'able.
Based on experience (See #2820), images start causing problems at around 200MB or around 70 million pixels (height x width > 70 million).
The pixel boundary is probably more reliable. If more than 70 million pixels, downsample by 50% before OCR'ing?
The text was updated successfully, but these errors were encountered: