CRNN-CTC framework for handwriting recognition. Our library was used to win the ICFHR2018 competition on automated text recognition on the READ Dataset.
- Install conda (a local sandbox/install manager), and create a new conda enviroment
OS X:
$ brew cask install anaconda
$ echo ". /usr/local/anaconda3/etc/profile.d/conda.sh" >> ~/.bash_profile
Then
$ conda create --name nephi python=2.7
$ conda activate nephi
- Install PyTorch.
# this is enough if you don't need CUDA or if you are in a Unix-based OS system. For OSX, build pytorch from source
conda install pytorch torchvision opencv -c pytorch -y
- Install lmdb, and a few more dependencies:
conda install -c conda-forge python-lmdb lxml python-levenshtein -y
- Install WarpCTC as explained here.
git clone https://github.com/SeanNaren/warp-ctc.git
cd warp-ctc
mkdir build
cd build
cmake ../
make
cd ../pytorch_binding
python setup.py install
On OS X, substitute cmake with
cmake ../ -DWITH_OMP=OFF
remove -std=c++11 from setup.py file, and add
cd ../build
cp libwarpctc.dylib cp libwarpctc.dylib /usr/local/anaconda3/lib
You can test that your install worked with $python
from warpctc_pytorch import CTCLoss
or this gist.
This repository is a fork from the pytorch version of Convolutional Recurrent Neural Network (CRNN) repository found here. And from the original CRNN paper.
- For training with variable length, please sort the image according to the text length.
- Create an lmdb database, clone this repository and use
create_dataset.py
as follows:
First fill one directory with training data, and one with validation data. Example data structure:
/path/to/your/data/25_this is what it says.png
/path/to/your/data/26_this is what the next one says.jpg
Now bootstrap the lmdb index databases:
nephi$ python create_dataset.py /path/to/your/training/data /new/train/lmdb/database
nephi$ python create_dataset.py /path/to/your/val/data /new/val/lmdb/database
If you'd like to input from XML descriptions (ex: XML that describes line portions within a larger image), add --xml at the end
python create_dataset.py /path/to/your/val/data /new/val/lmdb/database --xml
- To train a new model, we execute
crnn_main.py
. The argument format is as follows:
nephi$ python crnn_main.py --trainroot /new/train/lmdb/database --valroot /new/val/imdb/database [--cuda]
It will train using your trainroot data, backpropagating to the neural network every "batch size" images, and update the console with how well it's doing as it goes.
The --cuda
flag enables GPU acceleration. If your machine has CUDA and you do not use this flag, the software will warn you that you could be using GPU acceleration.
Be sure to provide a valid alphabet.txt file for your dataset (either pass one in as a parameter or create local file alphabet.txt).
For more help with argument structure, use nephi$ python crnn_main.py -h
.
UnicodeEncodeError: 'ascii' codec can't encode character u'\u016b' in position 10: ordinal not in range(128)
To solve this issue do
export PYTHONIOENCODING=utf8
Big thanks to the people that contributed with our library. Lead developer, Russell Ault from OSU who collaborated in the development of significant portion of the code and added some key features that helped improve our original baseline significantly. Also thanks to Roger Pack from FamilySearch who presented our work at the annual Family History Technology Workshop and who gave us good feedback during the development of the library. Other people worth to mention for their feedback and input are Dr. William Barrett from BYU, Dr. Doug Kennard and Seth Stewart.