About - OCR Toolkit

Search pre-defined keywords into the scanned PDF files using Levenshtein algorithm.

Prerequisites

Python
Tesseract

Install dependencies for Linux

Requires libtesseract (>=3.04) and libleptonica (>=1.71).

On Debian/Ubuntu:

$ sudo apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config

On RedHat/Fedora:

$ sudo dnf install tesseract tesseract-devel leptonica-devel leptonica

Install dependencies for Windows

Tesseract Docs
Tesseract
Leptonica

Setup Project

$ git clone <project_repo>

$ cd <project_directory>/

Install Source dependencies from `requirements`

$ pip install -r requirements/dev.txt

Package Build and Install

$ python -m build

For Windows

$ pip install dist/ocrmatcher-<version>-py3-none-any.whl

For Linux

$ pip install dist/ocrmatcher-<version>-tar.gz

Using

Add dataset folder current directory
Add Scanned PDF files into dataset directory
Add keywords.txt file into dataset directory
Add Search Keywords to keywords.txt file (each keywords must be new line without numbering)

Commands

List of available commands

$ ocrmatcher --help

Or

$ python -m ocrmatcher --help

Add new keywords by add-keywords command

$ ocrmatcher add-keywords --k my-search-keyword1 my-search-keyword2 etc.

Search Keywords

$ ocrmatcher search

Run with specific language

Search Keywords

$ ocrmatcher search --lang Occupant-Pigs

Run with specific threshold for two strings similarity, default is: 95

Search Keywords

$ ocrmatcher search --threshold 75

Pdf file convert to images

$ ocrmatcher pdf2img

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

About - OCR Toolkit

Prerequisites

Install dependencies for Linux

Install dependencies for Windows

Setup Project

Install Source dependencies from `requirements`

Package Build and Install

Using

Commands

Files

README.md

Latest commit

History

README.md

File metadata and controls

About - OCR Toolkit

Prerequisites

Install dependencies for Linux

Install dependencies for Windows

Setup Project

Install Source dependencies from requirements

Package Build and Install

Using

Commands

Install Source dependencies from `requirements`