Skip to content

Latest commit

 

History

History
92 lines (63 loc) · 2.86 KB

README.md

File metadata and controls

92 lines (63 loc) · 2.86 KB

OCR-ensemble

Some results

https://colab.research.google.com/drive/1hKu8q2SH80baCj-0IRBb9rLDSgBaU1w7#scrollTo=C9v0iNYVJO6Y

Installation

Follow these steps to set up the environment and install the required dependencies using conda.

Prerequisites

  • Python 3.9
  • PyTorch (GPU version)
  • PaddleOCR

Installing Dependencies

  1. Clone the repository:
git clone [email protected]:LAION-AI/OCR-ensemble.git
cd OCR-ensemble
  1. Create a conda virtual environment (optional, but recommended):
conda create -n your-env-name python=3.9
conda activate your-env-name
  1. Install PyTorch (GPU version) by following the instructions on the official website. Make sure to choose the conda-based installation for your system.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
  1. Install paddlepaddle by following the instructions on the official GitHub repository. In order to install the GPU version, this might be helpful:

Linux

python -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple

Windows

python -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
  1. Install the remaining required packages from the requirements.txt file:
pip install -r requirements.txt

Overview

  1. Classify document for type of text
  2. Use expert from ensemble of existing OCR + layout parsing models to get text+bboxes of text, —> concant that to caption
  3. If there is no original caption like for screenshots of websites and books, just make a caption, concat that with OCR results
  4. Use this data set to train clip with character level tokenization

Now we are working on Step 2.

Pipeline: 2 Passes

  1. Classify images to determine text types
  2. Expert models process filtered images

Candidate Expert Models