Dataset | Train | Validation | Test | Character-Level Annotation | Word-Level Annotation |
---|---|---|---|---|---|
ICDAR 2013 | 229 | No | 233 | Yes (Pixel-Level) | Yes (Rectangle) |
ICDAR 2015 | 1000 | No | 500 | No | Yes (Quadrangle) |
ICDAR 2017 COCO-Text | 43,486 | 10,000 | 10,000 | No | Yes (Rectangle) |
ICDAR 2017 MLT | 7200 | 1800 | email to [email protected] | No | Yes (Quadrangle) |
Demo images of ICDAR 2013.
The ICDAR 2013 datasets are from the ICDAR 2013 Robust Reading Competition, with 229 natural images for training and 233 for testing. The texts are annotated with character-level bounding boxes, and they are mostly horizontal and well focused.
Demo images of ICDAR 2015.
The ICDAR 2015 datasets are from the ICDAR 2015 Robust Reading Competition, with 1000 natural images for training and 500 for testing. The images are acquired using Google Glass and the texts accidentally appear in the scene without user’s prior intention. All the texts are annotated with word-level quadrangles.
Demo images of ICDAR 2017 COCO-Text.
The COCO-Text is a large scale dataset with 43,686 images for training and 20,000 for testing. The original images are from Microsoft COCO dataset.
Demo images of ICDAR 2017 MLT.
The ICDAR 2017 dataset on Multi-lingual scene text detection and script identification has 7200 images for training and 1800 images for validation. The dataset is composed of complete scene images which come from 9 languages (i.e., Chinese, Japanese, Korean, English, French, Arabic, Italian, German and Indian) representing 6 different scripts (i.e., Arabic, Latin, Chinese, Japanese, Korean and Bangla). It combines text detection with script identification, and contains much more images than related datasets.