IndicPhotoOCR is an advanced PhotoOCR toolkit designed for detecting, identifying, and recognizing text across 13 Indian languages, including Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Meitei Odia, Punjabi, Tamil, Telugu, Urdu, and English. Built to handle the unique scripts and complex structures of Indian languages, IndicPhotoOCR provides robust detection and recognition capabilities, making it a valuable tool for processing multilingual documents and enhancing document analysis in these diverse scripts.
Updates
Installation
How to use
Bharat Scene Text Dataset
Contributors
Acknowledgement
Contact us
[December 2024]: Detection Module: TextBPN++ added.
[November 2024]: Demo available in huggingface space.
[November 2024]: Code available at Google Colab.
[November 2024]: Added support for 10 languages in the recognition module.
[September 2024]: Private repository created.
Currently we need to manually create virtual environemnt.
conda create -n indicphotoocr python=3.9 -y
conda activate indicphotoocr
git clone https://github.com/Bhashini-IITJ/IndicPhotoOCR.git
cd IndicPhotoOCR
CPU Installation
python setup.py sdist bdist_wheel
pip install dist/IndicPhotoOCR-1.2.0-py3-none-any.whl[cpu]
CUDA 11.8 Installation
python setup.py sdist bdist_wheel
pip install ./dist/IndicPhotoOCR-1.2.0-py3-none-any.whl[cu118] --extra-index-url https://download.pytorch.org/whl/cu118
CUDA 12.1 Installation
python setup.py sdist bdist_wheel
pip install ./dist/IndicPhotoOCR-1.2.0-py3-none-any.whl[cu121] --extra-index-url https://download.pytorch.org/whl/cu121
If you find any trouble with the above installation use the setup.sh
script.
chmod +x setup.sh
./setup.sh
Currently this model works for hindi v/s english script identification and thereby hindi and english recognition.
Detection Model: TextBPN++
ScripIndetification Model: Hindi v/s English
Recognition Model: Hindi, English, Assamese, Bengali, Gujarati, Marathi, Odia, Punjabi, Tamil, Telugu.
>>> from IndicPhotoOCR.ocr import OCR
# Create an object of OCR
>>> ocr_system = OCR(verbose=True) # for CPU --> OCR(device="cpu")
# Get detections
>>> detections = ocr_system.detect("test_images/image_141.jpg")
# Running text detection...
# 4334 text boxes before nms
# 1.027989387512207
# Save and visualize the detection results
>>> ocr_system.visualize_detection("test_images/image_141.jpg", detections)
# Image saved at: test.png
>>> from IndicPhotoOCR.ocr import OCR
# Create an object of OCR
>>> ocr_system = OCR(verbose=True) # for CPU --> OCR(device="cpu")
# Get recognitions
>>> ocr_system.recognise("test_images/cropped_image/image_141_0.jpg", "hindi")
# Recognizing text in detected area...
# 'मण्डी'
>>> from IndicPhotoOCR.ocr import OCR
# Create an object of OCR
>>> ocr_system = OCR(verbose=True) # for CPU --> OCR(device="cpu")
# Complete pipeline
>>> ocr_system.ocr("test_images/image_141.jpg")
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Recognized word: रोड
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Recognized word: बाराखम्बा
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Recognized word: राजीव
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Recognized word: चौक
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Recognized word: मण्डी
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Recognized word: हाऊस
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Using cache found in /root/.cache/torch/hub/baudm_parseq_main
# Recognized word: rajiv
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Using cache found in /root/.cache/torch/hub/baudm_parseq_main
# Recognized word: chowk
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Using cache found in /root/.cache/torch/hub/baudm_parseq_main
# Recognized word: mandi
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Using cache found in /root/.cache/torch/hub/baudm_parseq_main
# Recognized word: house
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Using cache found in /root/.cache/torch/hub/baudm_parseq_main
# Recognized word: barakhamba
# Identifying script for the cropped area...
# Recognizing text in detected area...
# Using cache found in /root/.cache/torch/hub/baudm_parseq_main
# Recognized word: road
Bharat Scene Text Dataset - BSTD
Anik De |
Tech Lead and Main Contributor |
Abhirama | Aditya Rathore | Harshiv Shah |
Contributor | Contributor | Contributor |
Sagar Agarwal | Rajeev Yadav |
Contributor | Contributor |
Anand Mishra |
Project Investigator |
@misc{ipo,
author = {Anik De et al.}
title = {{I}ndic{P}hoto{O}CR: A comprehensive toolkit for {I}ndian language scene text understanding},
howpublished = {\url{https://github.com/Bhashini-IITJ/IndicPhotoOCR/}},
year = 2024,
}
Text Recognition - PARseq
EAST re-implemenation repository.
TextBPN++ repository.
National Language Translation Mission Bhashini.
For any queries, please contact us at: