-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker/install: build Tesseract from source #197
Conversation
Welcome to Codecov 🎉Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment. Thanks for integrating Codecov - We've got you covered ☂️ |
I wonder whether there are still reasons for building the Using the package from a recent Linux distribution is simpler and would save significant build time. Another possible approach would also work for |
Because most of the time, we cannot use Tesseract from a Linux distribution: our base distro is usually older than the current one, and we have no control over Tesseract features that we actually need. The same goes for PPA. We had good reasons to pin to a specific Tesseract version via source build in subrepo. No reason to give that up now.
Much simpler: conda |
@kba: Your changes resolved all my erros with my test workspace. I added a resmgr call to the dockerimage to add eng traineddata. I get an error when trying to process without it. Edit: Maybe equ.traineddata and osd.traineddata should be added as well, I am not sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that this is working now.
Some cosmetic change requests below. Adapting CircleCI config should follow.
In fact, since it already seems broken on master – unfortunately CircleCI does not keep the logs long enough, but I guess it's about the TESSDATA_PREFIX / resmgr location – we should fix this here. So I suggest (after rewriting |
Co-authored-by: Robert Sachunsky <[email protected]>
Now the CI config definitely needs |
@joschrew do you want me to make that change (on your fork's writable branch)? |
make deps-ubuntu no longer fetches Tesseract via PPA, so we need to make install-tesseract also, drop unsupported Python 3.6
(since normal Circleci `checkout` creates empty submodule directories)
using VIRTUAL_ENV from PYENV_ROOT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At last!
Oh, maybe we should also migrate |
and alias tessdata to /models for easier persistence
This PR is part of series to offer single ocrd modules as Docker Containers (ocrd slim containers) to be used with ocr-d network.
This Dockerfile currently doesn't work in all cases and it still needs updates. I created the PR anyway because I use/need it for my tests.EDIT now works. (This basically migrates all theinstall-tesseract
rules from ocrd_all's makefile here, where it actually belongs.)My idea was to maybe create the tesseract Container with ocrd_all: