OCR TypeScript Project
This project demonstrates how to perform Optical Character Recognition (OCR) using TypeScript. The project integrates Python for image processing tasks using the Tesseract OCR engine and outputs recognized text from various image formats like .jpg
and .webp
.
├── TheBalanceLetter.jpg # Sample image to process
├── coverletter.webp # Another sample image for OCR
├── index.ts # Main TypeScript file to run the project
├── ocr.py # Python script for OCR processing using Tesseract
├── ocr_output.txt # Output file containing extracted text from images
├── processed_image.png # Processed image after OCR operation
├── node_modules/ # Project dependencies installed via npm
├── package.json # Project configuration and dependencies
├── package-lock.json # Lockfile for npm dependencies
├── tsconfig.json # TypeScript configuration file
├── venv/ # Python virtual environment
Make sure you have the following installed:
-
Install dependencies:
-
Install Node.js dependencies:
npm install
-
Create and activate a Python virtual environment:
python3 -m venv venv source venv/bin/activate # For Linux/Mac .\venv\Scripts\activate # For Windows
-
Install Python dependencies:
pip install -r requirements.txt
-
-
Install Tesseract OCR:
- Follow the installation instructions for your platform.
-
Ensure you are in the project root directory.
-
Compile and run the TypeScript code:
npx ts-node index.ts
Use the provided Python script for OCR:
python ocr.py
The OCR results will be saved in ocr_output.txt
and any processed images will be saved as processed_image.png
.
The project uses TypeScript and Tesseract for OCR. Modify the index.ts file to change the images being processed. Similarly, the Python script (ocr.py) can be updated for additional image processing logic.
The project uses a tsconfig.json file for TypeScript settings, ensuring compatibility with modern JavaScript features and Node.js versions.
This project is licensed under the MIT License. see the LICENSE file for details.
Feel free to open issues or submit pull requests to improve this project.