Skip to content

Commit

Permalink
feature: make tesseract version 5.2.0 default for amazonlinux-2 builds
Browse files Browse the repository at this point in the history
  • Loading branch information
bweigel committed Jul 19, 2022
1 parent 226fb20 commit 0820278
Show file tree
Hide file tree
Showing 7 changed files with 17 additions and 16 deletions.
4 changes: 2 additions & 2 deletions Dockerfile.al2
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
FROM lambci/lambda-base-2:build

ARG LEPTONICA_VERSION=1.82.0
ARG TESSERACT_VERSION=4.1.3
ARG TESSERACT_VERSION=5.2.0
ARG AUTOCONF_ARCHIVE_VERSION=2017.09.28
ARG TMP_BUILD=/tmp
ARG TESSERACT=/opt/tesseract
Expand Down Expand Up @@ -40,7 +40,7 @@ RUN curl -L https://github.com/tesseract-ocr/tesseract/archive/${TESSERACT_VERSI
WORKDIR /opt
RUN mkdir -p ${DIST}/lib && mkdir -p ${DIST}/bin && \
cp ${TESSERACT}/bin/tesseract ${DIST}/bin/ && \
cp ${TESSERACT}/lib/libtesseract.so.4 ${DIST}/lib/ && \
cp ${TESSERACT}/lib/libtesseract.so.5 ${DIST}/lib/ && \
cp ${LEPTONICA}/lib/liblept.so.5 ${DIST}/lib/liblept.so.5 && \
cp /usr/lib64/libgomp.so.1 ${DIST}/lib/ && \
cp /usr/lib64/libwebp.so.4 ${DIST}/lib/ && \
Expand Down
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Tesseract OCR Lambda Layer
===

![Tesseract](https://img.shields.io/badge/Tesseract-4.1.3-green?style=flat-square)
![Tesseract](https://img.shields.io/badge/Tesseract-5.2.0-green?style=flat-square)
![Leptonica](https://img.shields.io/badge/Leptonica-1.82.0-green?style=flat-square)

![Examples available for Runtimes](https://img.shields.io/badge/Examples_(Lambda_runtimes)-Python_3.6(AL1),Python_3.8(AL2)-informational?style=flat-square)
Expand All @@ -14,19 +14,20 @@ Tesseract OCR Lambda Layer
<!-- TOC -->

- [Tesseract OCR Lambda Layer](#tesseract-ocr-lambda-layer)
- [Quickstart](#quickstart)
- [Ready-to-use binaries](#ready-to-use-binaries)
- [Use with Serverless Framework](#use-with-serverless-framework)
- [Use with AWS CDK](#use-with-aws-cdk)
- [Use with Serverless Framework](#use-with-serverless-framework)
- [Use with AWS CDK](#use-with-aws-cdk)
- [Build tesseract layer from source using Docker](#build-tesseract-layer-from-source-using-docker)
- [available `Dockerfile`s](#available-dockerfiles)
- [Building a different tesseract version and/or language](#building-a-different-tesseract-version-andor-language)
- [Deployment size optimization](#deployment-size-optimization)
- [available `Dockerfile`s](#available-dockerfiles)
- [Building a different tesseract version and/or language](#building-a-different-tesseract-version-andor-language)
- [Deployment size optimization](#deployment-size-optimization)
- [Building the layer binaries directly using CDK](#building-the-layer-binaries-directly-using-cdk)
- [Layer contents](#layer-contents)
- [Layer contents](#layer-contents)
- [Known Issues](#known-issues)
- [Avoiding Pillow library issues](#avoiding-pillow-library-issues)
- [Unable to import module 'handler': cannot import name '_imaging'](#unable-to-import-module-handler-cannot-import-name-_imaging)
- [Avoiding Pillow library issues](#avoiding-pillow-library-issues)
- [Unable to import module 'handler': cannot import name '_imaging'](#unable-to-import-module-handler-cannot-import-name-_imaging)
- [Contributors :heart:](#contributors-heart)

<!-- /TOC -->
Expand Down Expand Up @@ -149,7 +150,7 @@ unset CONTAINER

## Building a different tesseract version and/or language

Per default the build generated the [tesseract 4.1.3](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.3) OCR libraries with the _fast_ german, english and osd (orientation and script detection) [data files](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files) included.
Per default the build generates the [tesseract 4.1.3](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.3) (amazonlinux-1) or [5.2.0](https://github.com/tesseract-ocr/tesseract/releases/tag/5.2.0) (amazonlinux-2) OCR libraries with the _fast_ german, english and osd (orientation and script detection) [data files](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files) included.

The build process can be modified using different build time arguments (defined as `ARG` in `Dockerfile.al[1|2]`), using the `--build-arg` option of `docker build`.

Expand Down
6 changes: 3 additions & 3 deletions continous-integration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ Commands to reproduce:

```bash
npm ci
npx cdk synth
npx cdk --app 'npx ts-node index-al[1|2].ts' synth
## run integration test using AL1 & Python 3.6
npm npm run test:integration:al1
npx npm run test:integration:al1
## run integration test using AL2 & Python 3.8
npm npm run test:integration:al2
npx npm run test:integration:al2
```

## Bundling
Expand Down
2 changes: 1 addition & 1 deletion ready-to-use/amazonlinux-2/TESSERACT-README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
LEPTONICA_VERSION=1.82.0
TESSERACT_VERSION=4.1.3
TESSERACT_VERSION=5.2.0
TESSERACT_DATA_FILES=tessdata_fast/4.1.0
TESSERACT_DATA_LANGUAGES=osd,eng,deu
Binary file modified ready-to-use/amazonlinux-2/bin/tesseract
Binary file not shown.
Binary file removed ready-to-use/amazonlinux-2/lib/libtesseract.so.4
Binary file not shown.
Binary file added ready-to-use/amazonlinux-2/lib/libtesseract.so.5
Binary file not shown.

0 comments on commit 0820278

Please sign in to comment.