diff --git a/configs/rec/abinet/README.md b/configs/rec/abinet/README.md index 19ae0a706..6beaa3e48 100644 --- a/configs/rec/abinet/README.md +++ b/configs/rec/abinet/README.md @@ -5,7 +5,7 @@ > [Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/pdf/2103.06495) -## 1. Abstract +## Abstract Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition. [1] @@ -18,51 +18,19 @@ Linguistic knowledge is of great benefit to scene text recognition. However, how Figure 1. Architecture of ABINet [1]

-## 2. Results - - -### Accuracy - -According to our experiments, the evaluation results on public benchmark datasets ( IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: +## Requirements -
- Performance tested on ascend 910 with graph mode +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -
+## Quick Start +### Preparation - | **Model** | **Device** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** | - | :-----: |:----------:| :--------------: | :----------: | :--------: | :--------: |:----------: | - | ABINet | 8p | 91.35% | 14,867 s/epoch | 628.11 | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) -
- - Detailed accuracy results for each benchmark dataset -
- - | **Model** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | ABINet | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36%| 87.33% | 89.58% | 91.35% | -
-
- - -**Notes:** -- The input Shapes of MindIR of ABINet is (1, 3, 32, 128). - - -## 3. Quick Start -### 3.1 Preparation - -#### 3.1.1 Installation +#### Installation Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR. -#### 3.1.2 Dataset Download +#### Dataset Download Please download LMDB dataset for traininig and evaluation from - `training` contains two datasets: [MJSynth (MJ)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) and [SynthText (ST)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) - `evaluation` contains several benchmarking datasets, which are [IIIT](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html), [SVT](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset), [IC13](http://rrc.cvc.uab.es/?ch=2), [IC15](http://rrc.cvc.uab.es/?ch=4), [SVTP](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf), and [CUTE](http://cs-chan.com/downloads_CUTE80_dataset.html). @@ -99,7 +67,7 @@ data_lmdb_release/ │ └── lock.mdb ``` -#### 3.1.3 Dataset Usage +#### Dataset Usage Here we used the datasets under `train/` folders for **train**. After training, we used the datasets under `evaluation/` to evluation model accuracy. @@ -200,7 +168,7 @@ data_lmdb_release/ then you can evaluate on each dataset by modifying the config yaml as follows, and execute the script `tools/benchmarking/multi_dataset_eval.py`. -#### 3.1.4 Check YAML Config Files +#### Check YAML Config Files Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`. Explanations of these important args: @@ -244,7 +212,7 @@ eval: - Dataset: The MJSynth and SynthText datasets come from [ABINet_repo](https://github.com/FangShancheng/ABINet). -### 3.2 Model Training +### Model Training * Distributed Training @@ -256,7 +224,7 @@ It is easy to reproduce the reported results with the pre-defined training recip mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml ``` The pre-trained model needs to be loaded during ABINet model training, and the weight of the pre-trained model is -from https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt. It is needed to add the path of the pretrained weight to the model pretrained in "configs/rec/abinet/abinet_resnet45_en.yaml". +from [abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt). It is needed to add the path of the pretrained weight to the model pretrained in "configs/rec/abinet/abinet_resnet45_en.yaml". * Standalone Training @@ -269,7 +237,7 @@ python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`. -### 3.3 Model Evaluation +### Model Evaluation To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run: @@ -300,6 +268,42 @@ if not is_train: ) drop_remainder = True ``` +## Results + + +### Accuracy + +According to our experiments, the evaluation results on public benchmark datasets ( IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: + +Performance tested on ascend 910* with graph mode + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:| +| ABINet | Resnet45 | MJ+ST | 36.93 | 8 | 96 | O2 | 680.51 s | 115.56 | 6646.07 | 91.35% | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) | +
+ + Detailed accuracy results for each benchmark dataset +
+ +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| ABINet | Resnet45 | 1 | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36% | 87.33% | 89.58% | 91.35% | +
+ + +**Notes:** +- The input Shapes of MindIR of ABINet is (1, 3, 32, 128). + + ## References diff --git a/configs/rec/abinet/README_CN.md b/configs/rec/abinet/README_CN.md index 527ca0bba..360c963ec 100644 --- a/configs/rec/abinet/README_CN.md +++ b/configs/rec/abinet/README_CN.md @@ -5,7 +5,7 @@ > [Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/pdf/2103.06495) -## 1. 模型描述 +## 模型描述 语义知识对场景文本识别有很大的帮助。然而,如何在端到端深度网络中有效地建模语义规则仍然是一个研究挑战。在本文中,我们认为语言模型的能力有限来自于:1)隐式语言建模;2)单向特征表示;3)带噪声输入的语言模型。相应地,我们提出了一种自主、双向、迭代的场景文本识别ABINet。首先,自主阻塞视觉和语言模型之间的梯度流,以强制显式语言建模。其次,提出了一种基于双向特征表示的新型双向完形填空式网络作为语言模型。第三,提出了一种语言模型迭代修正的执行方式,可以有效缓解噪声输入的影响。此外,我们提出了一种基于迭代预测集合的自训练方法,可以有效地从未标记的图像中学习。大量的实验表明,ABINet在低质量图像上具有优势,并在几个主流基准上取得了最先进的结果。此外,集成自训练训练的ABINet在实现人类水平的识别方面也有很大的进步 [1] @@ -18,48 +18,21 @@ Modeling for Scene Text Recognition](https://arxiv.org/pdf/2103.06495) 图1. ABINet结构图 [1]

-## 2. 评估结果 - +## 配套版本 -### 精确度 -根据我们的实验,在公共基准数据集(IC13、IC15、IIIT、SVT、SVTP、CUTE)上的评估结果如下: +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -
-| **Model** | **Context** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** | -| :-----: | :-----------: | :--------------: | :----------: | :--------: | :--------: |:----------: | -| ABINet | D910x8-MS2.1-G | 91.35% | 14,867 s/epoch | 628.11 | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) -
+## 快速开始 +### 环境及数据准备 -
-
- 每个基准数据集的详细精度结果 - - | **Model** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | ABINet | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36%| 87.33% | 89.58% | 91.35% | -
-
- - - - -## 3. 快速开始 -### 3.1 环境及数据准备 - -#### 3.1.1 安装 +#### 安装 环境安装教程请参考MindOCR的 [installation instruction](https://github.com/mindspore-lab/mindocr#installation). -#### 3.1.2 Dataset Download +#### Dataset Download 请下载LMDB数据集用于训练和评估 - `training` 包含两个数据集: [MJSynth (MJ)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) 和 [SynthText (ST)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) - `evaluation` 包含几个基准数据集,它们是[IIIT](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html), [SVT](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset), [IC13](http://rrc.cvc.uab.es/?ch=2), [IC15](http://rrc.cvc.uab.es/?ch=4), [SVTP](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf), 和 [CUTE](http://cs-chan.com/downloads_CUTE80_dataset.html). @@ -96,7 +69,7 @@ data_lmdb_release/ │ └── lock.mdb ``` -#### 3.1.3 数据集使用 +#### 数据集使用 在这里,我们使用 `train/` 文件夹下的数据集进行训练,我们使用 `evaluation/` 下的数据集来评估模型的准确性。 @@ -213,7 +186,7 @@ eval: # label_file: # 验证或评估数据集的标签文件路径,将与`dataset_root`拼接形成完整的验证或评估数据的标签文件路径。当数据集为LMDB格式时无需配置 ... ``` -#### 3.1.4 检查配置文件 +#### 检查配置文件 除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下: @@ -257,7 +230,7 @@ eval: - 数据集:MJSynth和SynthText数据集来自作者公布的代码仓[ABINet_repo](https://github.com/FangShancheng/ABINet). -### 3.2 模型训练 +### 模型训练 * 分布式训练 @@ -268,7 +241,7 @@ eval: # 在多个 GPU/Ascend 设备上进行分布式训练 mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml ``` -ABINet模型训练时需要加载预训练模型,预训练模型的权重来自https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt,需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。 +ABINet模型训练时需要加载预训练模型,预训练模型的权重来自[abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt),需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。 * 单卡训练 @@ -283,7 +256,7 @@ python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml 训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`。 -### 3.3 模型评估 +### 模型评估 若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行: @@ -314,6 +287,42 @@ if not is_train: ) drop_remainder = True ``` + +## 评估结果 + + +### 精确度 +根据我们的实验,在公共基准数据集(IC13、IC15、IIIT、SVT、SVTP、CUTE)上的评估结果如下: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:| +| ABINet | Resnet45 | MJ+ST | 36.93 | 8 | 96 | O2 | 680.51 s | 115.56 | 6646.07 | 91.35% | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) | + +
+ + +
+
+ 每个基准数据集的详细精度结果 + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| ABINet | Resnet45 | 1 | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36% | 87.33% | 89.58% | 91.35% | + +
+
+ + ## 参考文献 diff --git a/configs/rec/crnn/README.md b/configs/rec/crnn/README.md index c98301c46..5e19ba3c9 100644 --- a/configs/rec/crnn/README.md +++ b/configs/rec/crnn/README.md @@ -318,31 +318,15 @@ We use a public Chinese text benchmark dataset [Benchmarking-Chinese-Text-Recogn For detailed instruction of data preparation and yaml configuration, please refer to [ch_dataeset](../../../docs/en/datasets/chinese_text_recognition.md). -### Training - -To train with the prepared datsets and config file, please run: - -```shell -mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/crnn/crnn_resnet34_ch.yaml -``` - -### Training with Custom Datasets -You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md). - - ## Performance ### General Purpose Chinese Models Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode. -*coming soon* - -Experiments are tested on ascend 910 with mindspore 2.3.1 graph mode. - -| **model name** | **backbone** | **cards** | **batch size** | **language** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** | -| :------------: | :----------: | :-------: | :------------: | :----------: | :-----------: | :---------------: | :---------: | :-------: | :-------: | :-----: | :----------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| CRNN | ResNet34_vd | 4 | 256 | Chinese | O2 | 203.48 s | 38.01 | 1180 | 60.45% | 65.95% | 97.68% | [https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c-105bccb2.mindir) | +| **model name** | **backbone** | **cards** | **batch size** | **language** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** | +| :------------: | :----------: | :-------: | :------------: | :----------: | :-----------: | :---------------: | :---------: | :-------: |:---------:|:-------:|:------------:|:-------------------------------------------------------------------------------------------------:| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| CRNN | ResNet34_vd | 4 | 256 | Chinese | O2 | 203.48 s | 38.01 | 1180 | 60.71% | 65.94% | 97.67% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c-105bccb2.mindir) | > The input shape for exported MindIR file in the download link is (1, 3, 32, 320). @@ -353,26 +337,18 @@ Experiments are tested on ascend 910 with mindspore 2.3.1 graph mode. Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode. -| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | -| :------------: | :----------: | :---------------: | :-----------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :----------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------: | -| CRNN | VGG7 | MJ+ST | 8.72 | 8 | 16 | O2 | 94.36 s | 14.76 | 8672.09 | 81.31% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/crnn/crnn_vgg7-6faf1b2d-910v2.ckpt) | - - -Experiments are tested on ascend 910 with mindspore 2.3.1 graph mode. - - -| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | -| :------------: | :----------: | :---------------: | :-----------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| CRNN | VGG7 | MJ+ST | 8.72 | 8 | 16 | O2 | 67.18 s | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | -| CRNN | ResNet34_vd | MJ+ST | 24.48 | 8 | 64 | O2 | 201.54 s | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:| :----------: | :---------------: | :-----------: | :-------: | :------------: | :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:---------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| CRNN | VGG7 | MJ+ST | 8.72 | 8 | 16 | O2 | 59 s | 15.47 | 8274.08 | 81.31% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/crnn/crnn_vgg7-6faf1b2d-910v2.ckpt) | +| CRNN | ResNet34_vd | MJ+ST | 24.48 | 8 | 64 | O2 | 120.41 s | 60.86 | 8412.75 | 84.73% |[yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | Detailed accuracy results for each benchmark dataset (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE): | **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | -| :------------: | :----------: | :-------: | :----------: | :----------: | :----------: | :-----------: | :-----------: | :-----------: | :-------------: | :-----: | :------: | :--------: | :---------: | -| CRNN | VGG7 | 1 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | -| CRNN | ResNet34_vd | 1 | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% | +| :------------: | :----------: | :-------: |:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| CRNN | VGG7 | 1 | 93.72% | 93.43% | 91.83% | 90.84% | 70.84% | 64.95% | 84.40% | 82.84% | 72.87% | 67.36% | 81.31% | +| CRNN | ResNet34_vd | 1 | 95.35% | 95.27% | 93.70% | 92.71% | 75.65% | 69.72% | 87.30% | 86.09% | 78.60% | 72.92% | 84.73% | #### Inference Performance @@ -388,8 +364,8 @@ Experiments are tested on ascend 310P with mindspore lite 2.3.1 graph mode. ### Notes - To reproduce the result on other contexts, please ensure the global batch size is the same. -- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). -- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. +- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary). +- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-download) section. - The input Shapes of MindIR of CRNN_VGG7 and CRNN_ResNet34_vd are both (1, 3, 32, 100). diff --git a/configs/rec/crnn/README_CN.md b/configs/rec/crnn/README_CN.md index f565f3eab..d31194187 100644 --- a/configs/rec/crnn/README_CN.md +++ b/configs/rec/crnn/README_CN.md @@ -319,30 +319,16 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置, 详细的数据准备和config文件配置方式, 请参考 [中文识别数据集准备](../../../docs/zh/datasets/chinese_text_recognition.md) -### 模型训练验证 - -准备好数据集和配置文件后,执行以下命令开启多卡训练 - -```shell -mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/crnn/crnn_resnet34_ch.yaml -``` - -### 使用自定义数据集进行训练 -您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/zh/tutorials/training_recognition_custom_dataset_CN.md)。 - ## 性能表现 ### 通用泛化中文模型 在采用图模式的ascend 910*上实验结果,mindspore版本为2.3.1 -*即将到来* - -在采用图模式的ascend 910上实验结果,mindspore版本为2.3.1 +| **model name** | **backbone** | **cards** | **batch size** | **language** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** | +| :------------: | :----------: | :-------: | :------------: | :----------: | :-----------: | :---------------: | :---------: | :-------: |:---------:|:-------:|:------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------:| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| CRNN | ResNet34_vd | 4 | 256 | Chinese | O2 | 203.48 s | 38.01 | 1180 | 60.71% | 65.94% | 97.67% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c-105bccb2.mindir) | -| **model name** | **backbone** | **cards** | **batch size** | **language** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** | -|:--------------:|:------------:|:--------------:|:-----------------:|:------------:|:---------:|:-----------------:|:---------:|:-------:|:------------:|:-----------:|:---------:|:------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| CRNN | ResNet34_vd | 4| 256| Chinese | O2 | 203.48 s | 38.01 | 1180 | 60.45% | 65.95% | 97.68% | [https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c-105bccb2.mindir) | > 链接中模型的MindIR导出时的输入Shape为`(1, 3, 32, 320)`. @@ -352,27 +338,17 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/crnn/ 在采用图模式的ascend 910*上实验结果,mindspore版本为2.3.1 - -| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | -| :------------: | :----------: | :---------------: | :-----------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :----------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------: | -| CRNN | VGG7 | MJ+ST | 8.72 | 8 | 16 | O2 | 94.36 s | 14.76 | 8672.09 | 81.31% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/crnn/crnn_vgg7-6faf1b2d-910v2.ckpt) | - - -在采用图模式的ascend 910上实验结果,mindspore版本为2.3.1 - - -| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | -| :------------: | :----------: | :---------------: | :-----------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| CRNN | VGG7 | MJ+ST | 8.72 | 8 | 16 | O2 | 67.18 s | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | -| CRNN | ResNet34_vd | MJ+ST | 24.48 | 8 | 64 | O2 | 201.54 s | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | - +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:| :----------: | :---------------: | :-----------: | :-------: | :------------: | :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:---------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| CRNN | VGG7 | MJ+ST | 8.72 | 8 | 16 | O2 | 59 s | 15.47 | 8274.08 | 81.31% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/crnn/crnn_vgg7-6faf1b2d-910v2.ckpt) | +| CRNN | ResNet34_vd | MJ+ST | 24.48 | 8 | 64 | O2 | 120.41 s | 60.86 | 8412.75 | 84.73% |[yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | 在各个基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的准确率: | **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | -| :------------: | :----------: | :-------: | :----------: | :----------: | :----------: | :-----------: | :-----------: | :-----------: | :-------------: | :-----: | :------: | :--------: | :---------: | -| CRNN | VGG7 | 1 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | -| CRNN | ResNet34_vd | 1 | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% | +| :------------: | :----------: | :-------: |:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| CRNN | VGG7 | 1 | 93.72% | 93.43% | 91.83% | 90.84% | 70.84% | 64.95% | 84.40% | 82.84% | 72.87% | 67.36% | 81.31% | +| CRNN | ResNet34_vd | 1 | 95.35% | 95.27% | 93.70% | 92.71% | 75.65% | 69.72% | 87.30% | 86.09% | 78.60% | 72.92% | 84.73% | @@ -391,8 +367,8 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/crnn/ ### 注意 - 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 -- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典) -- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。 +- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[字符词典](#字符词典) +- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集下载)章节。 - CRNN_VGG7和CRNN_ResNet34_vd的MindIR导出时的输入Shape均为(1, 3, 32, 100)。 diff --git a/configs/rec/master/README.md b/configs/rec/master/README.md index 1e367cf3f..ebbec8356 100644 --- a/configs/rec/master/README.md +++ b/configs/rec/master/README.md @@ -5,7 +5,7 @@ English | [中文](https://github.com/mindspore-lab/mindocr/blob/main/configs/re > [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/abs/1910.02562) -## 1. Introduction +## Introduction Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-drift problem because high similarity among encoded features leads to attention confusion under the RNN-based local attention mechanism. Moreover, RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, this paper proposes the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion, and (3) owns a great training efficiency because of high training parallelization and a high-speed inference because of an efficient memory-cache mechanism. Extensive experiments on various benchmarks demonstrate the superior performance of MASTER on both regular and irregular scene text. [1] @@ -18,54 +18,22 @@ Attention-based scene text recognizers have gained huge success, which leverages Figure 1. Architecture of MASTER [1]

-## 2. Results - +## Requirements -### Accuracy +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: -
+## Quick Start +### Preparation -| **Model** | **Context** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** | -| :-----: | :-----------: | :--------------: | :----------: | :--------: | :--------: |:----------: | -| Master-Resnet31 | D910x4-MS1.10-G | 90.37% | 6356 s/epoch | 2741 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/master/master_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31-e7bfbc97.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31_ascend-e7bfbc97-b724ed55.mindir) | -
- -
-
- Detailed accuracy results for each benchmark dataset - - | **Model** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | Master-ResNet31| 95.58% | 95.15% | 96.85% | 95.17% | 81.94% | 78.48% | 95.56% | 90.88% | 84.19% | 89.93% | 90.37% | -
-
- -**Notes:** -- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x4-MS1.10-G is for training on 4 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.10. -- To reproduce the result on other contexts, please ensure the global batch size is the same. -- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. -- The input Shapes of MindIR of MASTER is (1, 3, 48, 160). - - -## 3. Quick Start -### 3.1 Preparation - -#### 3.1.1 Installation +#### Installation Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR. -#### 3.1.2 Dataset Preparation +#### Dataset Preparation -##### 3.1.2.1 MJSynth, validation and evaluation dataset +##### MJSynth, validation and evaluation dataset Part of the lmdb dataset for training and evaluation can be downloaded from [here](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0) (ref: [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)). There're several zip files: - `data_lmdb_release.zip` contains the datasets including training data, validation data and evaluation data. - `training/` contains two datasets: [MJSynth (MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/) and [SynthText (ST)](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c). *Here we use **MJSynth only**.* @@ -74,7 +42,7 @@ Part of the lmdb dataset for training and evaluation can be downloaded from [her - `validation.zip`: same as the validation/ within data_lmdb_release.zip - `evaluation.zip`: same as the evaluation/ within data_lmdb_release.zip -##### 3.1.2.2 SynthText dataset +##### SynthText dataset For `SynthText`, we do not use the given LMDB dataset in `data_lmdb_release.zip`, since it only contains part of the cropped images. Please download the raw dataset from [here](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c) and prepare the LMDB dataset using the following command @@ -88,7 +56,7 @@ python tools/dataset_converters/convert.py \ ``` the `ST_full` contained the full cropped images of SynthText in LMDB data format. Please replace the `ST` folder with the `ST_full` folder. -##### 3.1.2.3 SynthAdd dataset +##### SynthAdd dataset Please download the **SynthAdd** Dataset from [here](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code: 627x). This dataset is proposed in . Please prepare the corresponding LMDB dataset using the following command @@ -102,7 +70,7 @@ python tools/dataset_converters/convert.py \ Please put the `SynthAdd` folder in `/training` directory. -#### 3.1.3 Dataset Usage +#### Dataset Usage Finally, the data structure should like this. @@ -219,7 +187,7 @@ eval: ... ``` -By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80. +By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80. 2. Evaluate on multiple datasets under the same folder @@ -259,7 +227,7 @@ eval: ... ``` -#### 3.1.4 Check YAML Config Files +#### Check YAML Config Files Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`. Explanations of these important args: @@ -270,7 +238,7 @@ system: amp_level_infer: "O2" seed: 42 val_while_train: True # Validate while training - drop_overflow_update: False + drop_overflow_update: True common: ... batch_size: &batch_size 512 # Batch size for training @@ -301,7 +269,7 @@ eval: - As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size. -### 3.2 Model Training +### Model Training * Distributed Training @@ -325,7 +293,7 @@ python tools/train.py --config configs/rec/master/master_resnet31.yaml The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`. -### 3.3 Model Evaluation +### Model Evaluation To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run: @@ -333,7 +301,47 @@ To evaluate the accuracy of the trained model, you can use `eval.py`. Please set python tools/eval.py --config configs/rec/master/master_resnet31.yaml ``` -## 4. Character Dictionary +## Results + + +### Accuracy + +According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:---------------:|:------------:|:-------------------:|:-------------:|:---------:| :------------: | :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:---------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| Master-Resnet31 | Resnet31 | MJ+ST+SyAythAdd | 68.23 | 4 | 16 | O2 | 194.99 s | 642.164 | 3189.22 | 90.34% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/master/master_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31-e7bfbc97.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31_ascend-e7bfbc97-b724ed55.mindir) | + +
+ +
+
+ Detailed accuracy results for each benchmark dataset + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:| :----------: | :-------: |:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +|Master-ResNet31 | ResNet31 | 1 | 93.72% | 95.16% | 96.85% | 95.17% | 81.94% | 78.48% | 95.57% | 90.88% | 84.19% | 89.58% | 90.34% | + +
+
+ +**Notes:** +- To reproduce the result on other contexts, please ensure the global batch size is the same. +- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-usage) section. +- The input Shapes of MindIR of MASTER is (1, 3, 48, 160). + + +## Character Dictionary ### Default Setting @@ -360,11 +368,11 @@ To use a specific dictionary, set the parameter `character_dict_path` to the pat - Remember to check the value of `dataset->transform_pipeline->RecMasterLabelEncode->lower` in the configuration yaml. Set it to False if you prefer case-sensitive encoding. -## 5. MindSpore Lite Inference +## MindSpore Lite Inference To inference with MindSpot Lite on Ascend 310, please refer to the tutorial [MindOCR Inference](../../../docs/en/inference/inference_tutorial.md). In short, the whole process consists of the following steps: -**1. Model Export** +**Model Export** Please [download](#2-results) the exported MindIR file first, or refer to the [Model Export](../../README.md) tutorial and use the following command to export the trained ckpt model to MindIR file: @@ -377,16 +385,16 @@ python tools/export.py --model_name_or_config configs/rec/master/master_resnet31 The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table. -**2. Environment Installation** +**Environment Installation** Please refer to [Environment Installation](../../../docs/en/inference/environment.md#2-mindspore-lite-inference) tutorial to configure the MindSpore Lite inference environment. -**3. Model Conversion** +**Model Conversion** Please refer to [Model Conversion](../../../docs/en/inference/convert_tutorial.md#1-mindocr-models), and use the `converter_lite` tool for offline conversion of the MindIR file. -**4. Inference** +**Inference** Assuming that you obtain output.mindir after model conversion, go to the `deploy/py_infer` directory, and use the following command for inference: diff --git a/configs/rec/master/README_CN.md b/configs/rec/master/README_CN.md index 6bc54222d..ccaadd4ed 100644 --- a/configs/rec/master/README_CN.md +++ b/configs/rec/master/README_CN.md @@ -5,7 +5,7 @@ > [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/abs/1910.02562) -## 1. 模型描述 +## 模型描述 基于注意力机制的场景文本识别器已经取得了巨大的成功,它利用仅占用更小中间表示的RNN编码器-解码器架构,来学习1维或2维的注意力。然而,这样的方法由于编码特征之间的相似度高,导致在基于RNN的局部注意力机制下出现了注意力失调问题。此外,基于RNN的方法由于并行化效率低而效率差。为了克服这些问题,本文提出了MASTER,一种基于自注意力机制的场景文本识别器,它(1)不仅编码了输入输出的注意力,还学习了Encoder和Decoder中的特征-特征和目标-目标关系,(2)学习了更强大和鲁棒的中间表示,以应对空间失真,(3)由于高度并行训练和高效的内存缓存机制,具有较高的训练效率和较快的推理速度。在各种基准测试中的广泛实验证明,MASTER在正常和不规则场景文本上表现出优异的性能。[1] @@ -19,53 +19,22 @@ 图1. MASTER结构 [1]

-## 2. 评估结果 - - -### 精度结果 - -根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下: - -
+## 配套版本 -| **模型** | **环境配置** | **平均准确率** | **训练时间** | **FPS** | **配置文件** | **模型权重下载** | -| :-----: | :-----: | :-----: | :-----: | :-----: |:--------: | :-----: | -| Master-Resnet31 | D910x4-MS1.10-G | 90.37% | 6356 s/epoch | 2741 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/master/master_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31-e7bfbc97.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31_ascend-e7bfbc97-b724ed55.mindir) | -
+| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -
-
- 在各个基准数据集上的准确率 - | **模型** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **平均准确率** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | Master-ResNet31| 95.58% | 95.15% | 96.85% | 95.17% | 81.94% | 78.48% | 95.56% | 90.88% | 84.19% | 89.93% | 90.37% | -
-
+## 快速开始 +### 环境及数据准备 -**注意:** -- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x8-MS1.10-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore1.10版本进行训练。 -- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 -- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。 -- Master的MindIR导出时的输入Shape均为(1, 3, 48, 160)。 - -## 3. 快速开始 -### 3.1 环境及数据准备 - -#### 3.1.1 安装 +#### 安装 环境安装教程请参考MindOCR的 [installation instruction](https://github.com/mindspore-lab/mindocr#installation). -#### 3.1.2 数据集准备 +#### 数据集准备 -##### 3.1.2.1 MJSynth, 验证集和测试集 +##### MJSynth, 验证集和测试集 部分LMDB格式的训练及验证数据集可以从[这里](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0) (出处: [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here))下载。连接中的文件包含多个压缩文件,其中: - `data_lmdb_release.zip` 包含了了部分数据集,有训练集(training/),验证集(validation/)以及测试集(evaluation)。 - `training.zip` 包括两个数据集,分别是 [MJSynth (MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/) 和 [SynthText (ST)](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c)。 这里我们只使用**MJSynth**。 @@ -74,7 +43,7 @@ Table Format: - `validation.zip`: 与 data_lmdb_release.zip 中的validation/ 一样。 - `evaluation.zip`: 与 data_lmdb_release.zip 中的evaluation/ 一样。 -##### 3.1.2.2 SynthText dataset +##### SynthText dataset 我们不使用`data_lmdb_release.zip`提供的`SynthText`数据, 因为它只包含部分切割下来的图片。请从[此处](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c)下载原始数据, 并使用以下命令转换成LMDB格式 @@ -88,7 +57,7 @@ python tools/dataset_converters/convert.py \ ``` `ST_full` 包含了所有已切割的图片,以LMDB格式储存。 请将 `ST` 文件夹换成 `ST_full` 文件夹。 -##### 3.1.2.3 SynthAdd dataset +##### SynthAdd dataset 另外请从[此处](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg)(密码:627x)下载**SynthAdd**训练集. 这个训练集是由提出。请使用以下命令转换成LMDB格式 @@ -102,7 +71,7 @@ python tools/dataset_converters/convert.py \ 并将转换完成的`SynthAdd`文件夹摆在`/training`里面. -#### 3.1.3 数据集使用 +#### 数据集使用 最终数据文件夹结构如下: @@ -218,7 +187,7 @@ eval: ... ``` -通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 +通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 2. 对同一文件夹下的多个数据集进行评估 @@ -258,7 +227,7 @@ eval: ... ``` -#### 3.1.4 检查配置文件 +#### 检查配置文件 除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下: @@ -269,7 +238,7 @@ system: amp_level_infer: "O2" seed: 42 val_while_train: True # 边训练边验证 - drop_overflow_update: False + drop_overflow_update: True common: ... batch_size: &batch_size 512 # 训练批大小 @@ -300,7 +269,7 @@ eval: - 由于全局批大小 (batch_size x num_devices) 是对结果复现很重要,因此当GPU/NPU卡数发生变化时,调整`batch_size`以保持全局批大小不变,或根据新的全局批大小线性调整学习率。 -### 3.2 模型训练 +### 模型训练 * 分布式训练 @@ -324,7 +293,7 @@ python tools/train.py --config configs/rec/master/master_resnet31.yaml 训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`。 -### 3.3 模型评估 +### 模型评估 若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行: @@ -332,7 +301,48 @@ python tools/train.py --config configs/rec/master/master_resnet31.yaml python tools/eval.py --config configs/rec/master/master_resnet31.yaml ``` -## 4. 字符词典 + +## 评估结果 + + +### 精度结果 + +根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:---------------:|:------------:|:-------------------:|:-------------:|:---------:| :------------: | :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:---------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| Master-Resnet31 | Resnet31 | MJ+ST+SyAythAdd | 68.23 | 4 | 16 | O2 | 194.99 s | 642.164 | 3189.22 | 90.34% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/master/master_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31-e7bfbc97.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31_ascend-e7bfbc97-b724ed55.mindir) | + +
+ +
+
+ 在各个基准数据集上的准确率 + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:| :----------: | :-------: |:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +|Master-ResNet31 | ResNet31 | 1 | 93.72% | 95.16% | 96.85% | 95.17% | 81.94% | 78.48% | 95.57% | 90.88% | 84.19% | 89.58% | 90.34% | + +
+
+ +**注意:** +- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 +- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#环境及数据准备)章节。 +- Master的MindIR导出时的输入Shape均为(1, 3, 48, 160)。 + + +## 字符词典 ### 默认设置 @@ -360,11 +370,11 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置, - 请记住检查配置文件中的 `dataset->transform_pipeline->RecMasterLabelEncode->lower` 参数的值。如果词典中有大小写字母而且想区分大小写的话,请将其设置为 False。 -## 5. MindSpore Lite 推理 +## MindSpore Lite 推理 请参考[MindOCR 推理](../../../docs/cn/inference/inference_tutorial.md)教程,基于MindSpore Lite在Ascend 310上进行模型的推理,包括以下步骤: -**1. 模型导出** +**模型导出** 请先[下载](#2-评估结果)已导出的MindIR文件,或者参考[模型导出](../../README.md)教程,使用以下命令将训练完成的ckpt导出为MindIR文件: @@ -376,15 +386,15 @@ python tools/export.py --model_name_or_config configs/rec/master/master_resnet31 其中,`data_shape`是导出MindIR时的模型输入Shape的height和width,下载链接中MindIR对应的shape值见[注释](#2-评估结果)。 -**2. 环境搭建** +**环境搭建** 请参考[环境安装](../../../docs/cn/inference/environment.md#2-mindspore-lite推理)教程,配置MindSpore Lite推理运行环境。 -**3. 模型转换** +**模型转换** 请参考[模型转换](../../../docs/cn/inference/convert_tutorial.md#1-mindocr模型)教程,使用`converter_lite`工具对MindIR模型进行离线转换。 -**4. 执行推理** +**执行推理** 假设在模型转换后得到output.mindir文件,在`deploy/py_infer`目录下使用以下命令进行推理: diff --git a/configs/rec/master/master_resnet31.yaml b/configs/rec/master/master_resnet31.yaml index df0c9bb2c..2618e2081 100644 --- a/configs/rec/master/master_resnet31.yaml +++ b/configs/rec/master/master_resnet31.yaml @@ -6,7 +6,7 @@ system: seed: 42 log_interval: 100 val_while_train: True - drop_overflow_update: False + drop_overflow_update: True ckpt_max_keep: 3 common: diff --git a/configs/rec/rare/README.md b/configs/rec/rare/README.md index 5ca8f206f..c5148aaff 100644 --- a/configs/rec/rare/README.md +++ b/configs/rec/rare/README.md @@ -5,7 +5,7 @@ English | [中文](https://github.com/mindspore-lab/mindocr/blob/main/configs/re > [Robust Scene Text Recognition with Automatic Rectification](https://arxiv.org/abs/1603.03915) -## 1. Introduction +## Introduction Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. The paper proposes RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. It shows that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model. [1] @@ -18,52 +18,20 @@ Recognizing text in natural images is a challenging task with many unsolved prob Figure 1. Architecture of SRN in RARE [1]

-## 2. Results - - -### Accuracy - -According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: - -
+## Requirements -| **Model** | **Context** | **Backbone** | **Transform Module** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** | -| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :--------: |:-----: | -| RARE | D910x4-MS1.10-G | ResNet34_vd | None | 85.19% | 3166 s/epoch | 4561 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir) | -
+| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -
-
- Detailed accuracy results for each benchmark dataset - | **Model** | **Backbone** | **Transform Module** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | RARE | ResNet34_vd | None | 95.12% | 94.58% | 94.28% | 92.71% | 75.31% | 69.52% | 88.17% | 87.33% | 78.91% | 76.04% | 85.19% | -
-
+## Quick Start +### Preparation -**Notes:** -- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x4-MS1.10-G is for training on 4 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.10. -- To reproduce the result on other contexts, please ensure the global batch size is the same. -- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). -- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. -- The input Shapes of MindIR of RARE is (1, 3, 32, 100) and it is for Ascend only. - -## 3. Quick Start -### 3.1 Preparation - -#### 3.1.1 Installation +#### Installation Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR. -#### 3.1.2 Dataset Download +#### Dataset Download Please download lmdb dataset for traininig and evaluation from [here](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0) (ref: [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)). There're several zip files: - `data_lmdb_release.zip` contains the **entire** datasets including training data, validation data and evaluation data. - `training/` contains two datasets: [MJSynth (MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/) and [SynthText (ST)](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c) @@ -110,7 +78,7 @@ data_lmdb_release/ └── lock.mdb ``` -#### 3.1.3 Dataset Usage +#### Dataset Usage Here we used the datasets under `training/` folders for training, and the union dataset `validation/` for validation. After training, we used the datasets under `evaluation/` to evaluate model accuracy. @@ -225,7 +193,7 @@ eval: ... ``` -#### 3.1.4 Check YAML Config Files +#### Check YAML Config Files Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`. Explanations of these important args: @@ -235,7 +203,7 @@ system: amp_level: 'O2' seed: 42 val_while_train: True # Validate while training - drop_overflow_update: False + drop_overflow_update: True common: ... batch_size: &batch_size 512 # Batch size for training @@ -266,7 +234,7 @@ eval: - As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size. -### 3.2 Model Training +### Model Training * Distributed Training @@ -290,15 +258,54 @@ python tools/train.py --config configs/rec/rare/rare_resnet34.yaml The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`. -### 3.3 Model Evaluation +### Model Evaluation To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run: ```shell python tools/eval.py --config configs/rec/rare/rare_resnet34.yaml ``` +## Results + -## 4. Character Dictionary +### Accuracy + +According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:| :---------------: |:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| RARE | ResNet34_vd | MJ+ST | 25.31 | 4 | 512 | O2 | 252.62 s | 180.26 | 11361.43 | 85.24% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir)| + +
+ +
+
+ Detailed accuracy results for each benchmark dataset + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:--------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| RARE | ResNet34_vd | 1 | 95.12% | 94.57% | 94.40% | 92.81% | 75.43% | 69.62% | 88.17% | 87.33% | 78.91% | 76.04% | 85.24% | + +
+
+ +**Notes:** +- To reproduce the result on other contexts, please ensure the global batch size is the same. +- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary). +- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-usage) section. +- The input Shapes of MindIR of RARE is (1, 3, 32, 100) and it is for Ascend only. + +## Character Dictionary ### Default Setting @@ -325,7 +332,7 @@ To use a specific dictionary, set the parameter `character_dict_path` to the pat - Remember to check the value of `dataset->transform_pipeline->RecAttnLabelEncode->lower` in the configuration yaml. Set it to False if you prefer case-sensitive encoding. -## 5. Chinese Text Recognition Model Training +## Chinese Text Recognition Model Training Currently, this model supports multilingual recognition and provides pre-trained models for different languages. Details are as follows: @@ -335,35 +342,24 @@ We use a public Chinese text benchmark dataset [Benchmarking-Chinese-Text-Recogn For detailed instruction of data preparation and yaml configuration, please refer to [ch_dataset](../../../docs/en/datasets/chinese_text_recognition.md). -### Training - -To train with the prepared datsets and config file, please run: - -```shell -mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/rare_resnet34_ch.yaml -``` - ### Results and Pretrained Weights After training, evaluation results on the benchmark test set are as follows, where we also provide the model config and pretrained weights.
-| **Model** | **Language** | **Backbone** | **Transform Module** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** | -| :-----: | :-----: | :--------: | :------------: | :--------: | :--------: | :--------: | :--------: | :--------: |:---------: | :-----------: | -| RARE | Chinese | ResNet34_vd | None | 62.15% | 67.05% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) | +| **Model** | **Language** | **Backbone** | **Transform Module** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** | +|:---------:|:------------:|:------------:|:--------------------:|:---------:|:-------:|:------------:|:------------:|:-------:|:------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| RARE | Chinese | ResNet34_vd | None | 62.39% | 67.02% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) |
- The input Shapes of MindIR of RARE is (1, 3, 32, 320) and it is for Ascend only. -### Training with Custom Datasets -You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md). - -## 6. MindSpore Lite Inference +## MindSpore Lite Inference To inference with MindSpot Lite on Ascend 310, please refer to the tutorial [MindOCR Inference](../../../docs/en/inference/inference_tutorial.md). In short, the whole process consists of the following steps: -**1. Model Export** +**Model Export** Please [download](#2-results) the exported MindIR file first, or refer to the [Model Export](../../README.md) tutorial and use the following command to export the trained ckpt model to MindIR file: @@ -376,16 +372,16 @@ python tools/export.py --model_name_or_config configs/rec/rare/rare_resnet34 --d The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table. -**2. Environment Installation** +**Environment Installation** Please refer to [Environment Installation](../../../docs/en/inference/environment.md#2-mindspore-lite-inference) tutorial to configure the MindSpore Lite inference environment. -**3. Model Conversion** +**Model Conversion** Please refer to [Model Conversion](../../../docs/en/inference/convert_tutorial.md#1-mindocr-models), and use the `converter_lite` tool for offline conversion of the MindIR file. -**4. Inference** +**Inference** Assuming that you obtain output.mindir after model conversion, go to the `deploy/py_infer` directory, and use the following command for inference: diff --git a/configs/rec/rare/README_CN.md b/configs/rec/rare/README_CN.md index 8e7abc006..f97d2d9c4 100644 --- a/configs/rec/rare/README_CN.md +++ b/configs/rec/rare/README_CN.md @@ -5,7 +5,7 @@ > [Robust Scene Text Recognition with Automatic Rectification](https://arxiv.org/abs/1603.03915) -## 1. 模型描述 +## 模型描述 识别自然图像中的文本是一个包含许多未解决问题的挑战性任务。与文档中的文字不同,自然图像中的文字通常具有不规则的形状,这是由透视畸变、曲线字符等因素引起的。该论文提出了RARE(Robust Scene Text Recognition with Automatic Rectification),这是一种对不规则文本具有鲁棒性的识别模型。RARE是一种特别设计的深度神经网络,由空间变换网络(STN)和序列识别网络(SRN)组成。在测试中,图像首先通过预测的Thin-Plate-Spline(TPS)变换进行矫正,成为接下来的SRN可以识别的更加“可读”的图像,SRN通过序列识别方法识别文本。研究表明,该模型能够识别多种类型的不规则文本,包括透视文本和曲线文本。RARE是端到端可训练的,只需要图像和相关的文本标签,这使得训练和部署模型在实际系统中变得更加方便。在几个基准数据集上,该模型达到了SOTA性能,充分证明了所提出模型的有效性。 [1] @@ -19,52 +19,20 @@ 图1. RARE中的SRN结构 [1]

-## 2. 评估结果 - - -### 精度结果 - -根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下: - -
+## 配套版本 -| **模型** | **环境配置** | **骨干网络** | **空间变换网络** | **平均准确率** | **训练时间** | **FPS** | **配置文件** | **模型权重下载** | -| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :--------: |:-----: | -| RARE | D910x4-MS1.10-G | ResNet34_vd | 无 | 85.19% | 3166 s/epoch | 4561 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir) | -
+| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -
-
- 在各个基准数据集上的准确率 - | **模型** | **骨干网络** | **空间变换网络** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **平均准确率** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | RARE | ResNet34_vd | None | 95.12% | 94.58% | 94.28% | 92.71% | 75.31% | 69.52% | 88.17% | 87.33% | 78.91% | 76.04% | 85.19% | -
-
+## 快速开始 +### 环境及数据准备 -**注意:** -- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x4-MS1.10-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore1.10版本进行训练。 -- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 -- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典) -- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。 -- RARE的MindIR导出时的输入Shape均为(1, 3, 32, 100),只能在昇腾卡上使用。 - -## 3. 快速开始 -### 3.1 环境及数据准备 - -#### 3.1.1 安装 +#### 安装 环境安装教程请参考MindOCR的 [installation instruction](https://github.com/mindspore-lab/mindocr#installation). -#### 3.1.2 数据集下载 +#### 数据集下载 LMDB格式的训练及验证数据集可以从[这里](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0) (出处: [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here))下载。连接中的文件包含多个压缩文件,其中: - `data_lmdb_release.zip` 包含了**完整**的一套数据集,有训练集(training/),验证集(validation/)以及测试集(evaluation)。 - `training.zip` 包括两个数据集,分别是 [MJSynth (MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/) 和 [SynthText (ST)](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c) @@ -73,7 +41,7 @@ LMDB格式的训练及验证数据集可以从[这里](https://www.dropbox.com/s - `validation.zip`: 与 data_lmdb_release.zip 中的validation/ 一样。 - `evaluation.zip`: 与 data_lmdb_release.zip 中的evaluation/ 一样。 -#### 3.1.3 数据集使用 +#### 数据集使用 解压文件后,数据文件夹结构如下: @@ -184,7 +152,7 @@ eval: ... ``` -通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 +通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 2. 对同一文件夹下的多个数据集进行评估 @@ -224,7 +192,7 @@ eval: ... ``` -#### 3.1.4 检查配置文件 +#### 检查配置文件 除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下: @@ -234,7 +202,7 @@ system: amp_level: 'O2' seed: 42 val_while_train: True # 边训练边验证 - drop_overflow_update: False + drop_overflow_update: True common: ... batch_size: &batch_size 512 # 训练批大小 @@ -265,7 +233,7 @@ eval: - 由于全局批大小 (batch_size x num_devices) 是对结果复现很重要,因此当GPU/NPU卡数发生变化时,调整`batch_size`以保持全局批大小不变,或根据新的全局批大小线性调整学习率。 -### 3.2 模型训练 +### 模型训练 * 分布式训练 @@ -289,15 +257,54 @@ python tools/train.py --config configs/rec/rare/rare_resnet34.yaml 训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`。 -### 3.3 模型评估 +### 模型评估 若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行: ```shell python tools/eval.py --config configs/rec/rare/rare_resnet34.yaml ``` +## 评估结果 + -## 4. 字符词典 +### 精度结果 + +根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:| :---------------: |:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| RARE | ResNet34_vd | MJ+ST | 25.31 | 4 | 512 | O2 | 252.62 s | 180.26 | 11361.43 | 85.24% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir)| + +
+ +
+
+ 在各个基准数据集上的准确率 + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:--------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| RARE | ResNet34_vd | 1 | 95.12% | 94.57% | 94.40% | 92.81% | 75.43% | 69.62% | 88.17% | 87.33% | 78.91% | 76.04% | 85.24% | + +
+
+ +**注意:** +- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 +- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[字符词典](#字符词典) +- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集下载)章节。 +- RARE的MindIR导出时的输入Shape均为(1, 3, 32, 100),只能在昇腾卡上使用。 + +## 字符词典 ### 默认设置 @@ -324,7 +331,7 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置, - 您可以通过将配置文件中的参数 `use_space_char` 设置为 True 来包含空格字符。 - 请记住检查配置文件中的 `dataset->transform_pipeline->RecAttnLabelEncode->lower` 参数的值。如果词典中有大小写字母而且想区分大小写的话,请将其设置为 False。 -## 5. 中文识别模型训练 +## 中文识别模型训练 目前,RARE模型支持多语种识别和提供中英预训练模型。详细内容如下 @@ -334,12 +341,6 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置, 详细的数据准备和config文件配置方式, 请参考 [中文识别数据集准备](../../../docs/cn/datasets/chinese_text_recognition.md) -### 模型训练验证 - -准备好数据集和配置文件后,执行以下命令开启多卡训练 -```shell -mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/rare_resnet34_ch.yaml -``` ### 预训练模型数据集介绍 不同语种的预训练模型采用不同数据集作为预训练,数据来源、训练方式和评估方式可参考 **数据说明**。 @@ -353,9 +354,11 @@ mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/
-| **模型** | **语种** | **骨干网络** | **空间变换网络** | **街景类** | **网页类** | **文档类** | **训练时间** | **FPS** | **配置文件** | **模型权重下载** | -| :-----: | :-----: | :--------: | :------------: | :--------: | :--------: | :--------: |:--------: | :--------: |:--------: | :--------: | -| RARE | 中文 | ResNet34_vd | 无 | 62.15% | 67.05% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) | +| **Model** | **Language** | **Backbone** | **Transform Module** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** | +|:---------:|:------------:|:------------:|:--------------------:|:---------:|:-------:|:------------:|:------------:|:-------:|:------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| RARE | Chinese | ResNet34_vd | None | 62.39% | 67.02% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) | + +
- RARE的MindIR导出时的输入Shape均为(1, 3, 32, 320),只能在昇腾卡上使用。 @@ -364,11 +367,11 @@ mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/ 您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/cn/tutorials/training_recognition_custom_dataset_CN.md)。 -## 6. MindSpore Lite 推理 +## MindSpore Lite 推理 请参考[MindOCR 推理](../../../docs/cn/inference/inference_tutorial.md)教程,基于MindSpore Lite在Ascend 310上进行模型的推理,包括以下步骤: -**1. 模型导出** +**模型导出** 请先[下载](#2-评估结果)已导出的MindIR文件,或者参考[模型导出](../../README.md)教程,使用以下命令将训练完成的ckpt导出为MindIR文件: @@ -380,15 +383,15 @@ python tools/export.py --model_name_or_config configs/rec/rare/rare_resnet34.yam 其中,`data_shape`是导出MindIR时的模型输入Shape的height和width,下载链接中MindIR对应的shape值见[注释](#2-评估结果)。 -**2. 环境搭建** +**环境搭建** 请参考[环境安装](../../../docs/cn/inference/environment.md#2-mindspore-lite推理)教程,配置MindSpore Lite推理运行环境。 -**3. 模型转换** +**模型转换** 请参考[模型转换](../../../docs/cn/inference/convert_tutorial.md#1-mindocr模型)教程,使用`converter_lite`工具对MindIR模型进行离线转换。 -**4. 执行推理** +**执行推理** 假设在模型转换后得到output.mindir文件,在`deploy/py_infer`目录下使用以下命令进行推理: diff --git a/configs/rec/rare/rare_resnet34.yaml b/configs/rec/rare/rare_resnet34.yaml index d910b7c21..4f4a8320a 100644 --- a/configs/rec/rare/rare_resnet34.yaml +++ b/configs/rec/rare/rare_resnet34.yaml @@ -5,7 +5,7 @@ system: seed: 42 log_interval: 100 val_while_train: True - drop_overflow_update: False + drop_overflow_update: True common: character_dict_path: &character_dict_path diff --git a/configs/rec/robustscanner/README.md b/configs/rec/robustscanner/README.md index 2bbb8e3d7..79079c0c7 100644 --- a/configs/rec/robustscanner/README.md +++ b/configs/rec/robustscanner/README.md @@ -5,7 +5,7 @@ English | [中文](https://github.com/mindspore-lab/mindocr/blob/main/configs/re > [RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/pdf/2007.07542.pdf) -## 1. Introduction +## Introduction RobustScanner is an encoder-decoder text recognition algorithm with attention mechanism. The authors of this paper conducted research on the mainstream encoder-decoder recognition frameworks and found that during the decoding process, text not only relies on contextual information but also utilizes positional information. However, most methods rely too much on context information during the decoding process, leading to serious attention shifting problems and thus result in poor performance for text recognition with weak context information or contextless information. @@ -21,54 +21,20 @@ Overall, the RobustScanner model consists of an encoder and a decoder. The encod Figure 1. Overall RobustScanner architecture [1]

-## 2. Results - +## Requirements -### Accuracy - -According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -
-| **Model** | **Context** | **Backbone** | **Avg Accuracy** | **Train T.** | **FPS** | **ms/step** | **Recipe** | **Download** | -|:-------------:|:--------------:|:---------:|:---------:|:---------------------:|:-------:|:-----------:|:----------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| RobustScanner | D910x4-MS2.0-G | ResNet-31 | 87.86% | 12702 s/epoch | 550 | 465 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir) | -
- -Note: In addition to using the MJSynth (partial) and SynthText (partial) text recognition datasets, RobustScanner is also trained with the SynthAdd dataset and some real datasets. The specific details of the data can be found in the paper or [here](#312-Dataset-Download). +## Quick Start +### Preparation -
-
- Detailed accuracy results for each benchmark dataset - -| **Model** | **Backbone** | **IIIT5k** | **SVT** | **IC13** | **IC15** | **SVTP** | **CUTE80** | **Average** | -| :------: | :------: |:----------:|:-------:|:--------:|:--------:|:--------:|:----------:|:-----------:| -| RobustScanner | ResNet-31 | 95.50% | 92.12% | 94.29% | 73.33% | 82.33% | 89.58% | 87.86% | -
-
- -**Notes:** -- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x4-MS1.10-G is for training on 4 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.10. -- To reproduce the result on other contexts, please ensure the global batch size is the same. -- The model uses an English character dictionary, en_dict90.txt, consisting of 90 characters including digits, common symbols, and upper and lower case English letters. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). -- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. -- The input Shapes of MindIR of RobustScanner is (1, 3, 48, 160) and it is for Ascend only. - -## 3. Quick Start -### 3.1 Preparation - -#### 3.1.1 Installation +#### Installation Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR. -#### 3.1.2 Dataset Download +#### Dataset Download The dataset used for training and validation in this work, was referenced from the datasets used by mmocr and PaddleOCR for reproducing the RobustScanner algorithms. We are very grateful to mmocr and PaddleOCR for improving the reproducibility efficiency of this repository. The details of the dataset are as follows: @@ -99,7 +65,7 @@ The downloaded file contains several compressed files, including: - `testing_lmdb.zip`: contains six datasets used for evaluating the model, including CUTE80, icdar2013, icdar2015, IIIT5k, SVT, and SVTP. -#### 3.1.3 Dataset Usage +#### Dataset Usage The data folder should be unzipped following the directory structure below: @@ -257,7 +223,7 @@ eval: ... ``` -#### 3.1.4 Check YAML Config Files +#### Check YAML Config Files Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.loader.batch_size`. Explanations of these important args: @@ -297,7 +263,7 @@ eval: - As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size. -### 3.2 Model Training +### Model Training * Distributed Training @@ -321,7 +287,7 @@ python tools/train.py --config configs/rec/robustscanner/robustscanner_resnet31. The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`. -### 3.3 Model Evaluation +### Model Evaluation To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run: @@ -329,7 +295,49 @@ To evaluate the accuracy of the trained model, you can use `eval.py`. Please set python tools/eval.py --config configs/rec/robustscanner/robustscanner_resnet31.yaml ``` -## 4. Character Dictionary +## Results + + +### Accuracy + +According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| RobustScanner | ResNet31 | Real_data+Synth_data | 48.00 | 4 | 64 | O2 | 325.35 s | 142.95 | 1790.87 | 89.37% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir)| + +
+ +Note: In addition to using the MJSynth (partial) and SynthText (partial) text recognition datasets, RobustScanner is also trained with the SynthAdd dataset and some real datasets. The specific details of the data can be found in the paper or [here](#312-Dataset-Download). + +
+
+ Detailed accuracy results for each benchmark dataset + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| RobustScanner | ResNet31 | 1 | 94.77% | 94.35% | 95.22% | 94.29% | 82.16% | 73.38% | 95.53% | 92.12% | 82.33% | 89.58% | 89.37% | + +
+
+ +**Notes:** +- To reproduce the result on other contexts, please ensure the global batch size is the same. +- The model uses an English character dictionary, en_dict90.txt, consisting of 90 characters including digits, common symbols, and upper and lower case English letters. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary). +- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-usage) section. +- The input Shapes of MindIR of RobustScanner is (1, 3, 48, 160) and it is for Ascend only. + + +## Character Dictionary ### Default Setting diff --git a/configs/rec/robustscanner/README_CN.md b/configs/rec/robustscanner/README_CN.md index 43f367bfd..a2c72ef02 100644 --- a/configs/rec/robustscanner/README_CN.md +++ b/configs/rec/robustscanner/README_CN.md @@ -5,7 +5,7 @@ > [RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/pdf/2007.07542.pdf) -## 1. 模型描述 +## 模型描述 RobustScanner是具有注意力机制的编码器-解码器文字识别算法,本作作者通过对当时主流方法编解码器识别框架的研究,发现文字在解码过程中,不仅依赖上下文信息,还会利用位置信息。而大多数方法在解码过程中都过度依赖语义信息,导致存在较为严重的注意力偏移问题,对于没有语义信息或者弱语义信息的文本识别效果不佳。 @@ -21,55 +21,20 @@ RobustScanner是具有注意力机制的编码器-解码器文字识别算法, 图1. RobustScanner整体架构图 [1]

+## 配套版本 -## 2. 评估结果 - +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -### 训练端 -根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下: - - -
- -| **模型** | **环境配置** | **骨干网络** | **平均准确率** | **训练时间** | **FPS** | **ms/step** | **配置文件** | **模型权重下载** | -|:-------------:|:--------------:|:---------:|:---------:|:---------------------:|:-------:|:-----------:|:----------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| RobustScanner | D910x4-MS2.0-G | ResNet-31 | 87.86% | 12702 s/epoch | 550 | 465 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir) | -
-注:除了使用MJSynth(部分)和SynthText(部分)两个文字识别数据集外,还加入了SynthAdd数据,和部分真实数据,具体数据细节可以参考论文或[这里](#312-数据集下载)。 +## 快速开始 +### 环境及数据准备 -
-
- 在各个基准数据集上的准确率 - - | **模型** | **骨干网络** | **IIIT5k** | **SVT** | **IC13** | **IC15** | **SVTP** | **CUTE80** | **平均准确率** | - | :------: | :------: |:----------:|:-------:|:--------:|:--------:|:--------:|:----------:|:---------:| - | RobustScanner | ResNet-31 | 95.50% | 92.12% | 94.29% | 73.33% | 82.33% | 89.58% | 87.86% | -
-
- -**注意:** -- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x4-MS2.0-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore2.0版本进行训练。 -- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 -- 模型使用90个字符的英文字典en_dict90.txt,其中有数字,常用符号以及大小写的英文字母,详细请看[4. 字符词典](#4-字符词典) -- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。 -- RobustScanner的MindIR导出时的输入Shape均为(1, 3, 48, 160)。 - -## 3. 快速开始 -### 3.1 环境及数据准备 - -#### 3.1.1 安装 +#### 安装 环境安装教程请参考MindOCR的 [installation instruction](https://github.com/mindspore-lab/mindocr#installation). -#### 3.1.2 数据集下载 +#### 数据集下载 本RobustScanner训练、验证使用的数据集参考了mmocr和PaddleOCR所使用的数据集对文献算法进行复现,在此非常感谢mmocr和PaddleOCR,提高了本repo的复现效率。 数据集细节如下: @@ -98,7 +63,7 @@ Table Format: - `SynthText800K_shuffle_xxx_xxx.zip`: 1_200共5个zip文件,包含SynthText数据集中随机挑选的240万个样本。 - 验证集 - `testing_lmdb.zip`: 包含了评估模型使用的CUTE80, icdar2013, icdar2015, IIIT5k, SVT, SVTP六个数据集。 -#### 3.1.3 数据集使用 +#### 数据集使用 数据文件夹按照如下结构进行解压: @@ -216,7 +181,7 @@ eval: ... ``` -通过使用上述配置 yaml 运行 [模型评估](#33-模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 +通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 2. 对同一文件夹下的多个数据集进行评估 @@ -256,7 +221,7 @@ eval: ... ``` -#### 3.1.4 检查配置文件 +#### 检查配置文件 除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.loader.batch_size`。说明如下: @@ -300,7 +265,7 @@ eval: - 您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/cn/tutorials/training_recognition_custom_dataset.md)。 -### 3.2 模型训练 +### 模型训练 * 分布式训练 @@ -324,7 +289,7 @@ python tools/train.py --config configs/rec/robustscanner/robustscanner_resnet31. 训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`。 -### 3.3 模型评估 +### 模型评估 若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行: @@ -332,7 +297,49 @@ python tools/train.py --config configs/rec/robustscanner/robustscanner_resnet31. python tools/eval.py --config configs/rec/robustscanner/robustscanner_resnet31.yaml ``` -## 4. 字符词典 +## 评估结果 + + +### 训练端 +根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下: + + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| RobustScanner | ResNet31 | Real_data+Synth_data | 48.00 | 4 | 64 | O2 | 325.35 s | 142.95 | 1790.87 | 89.37% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir)| + +
+ +注:除了使用MJSynth(部分)和SynthText(部分)两个文字识别数据集外,还加入了SynthAdd数据,和部分真实数据,具体数据细节可以参考论文或[这里](#数据集下载)。 + +
+
+ 在各个基准数据集上的准确率 + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| RobustScanner | ResNet31 | 1 | 94.77% | 94.35% | 95.22% | 94.29% | 82.16% | 73.38% | 95.53% | 92.12% | 82.33% | 89.58% | 89.37% | + +
+
+ +**注意:** +- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 +- 模型使用90个字符的英文字典en_dict90.txt,其中有数字,常用符号以及大小写的英文字母,详细请看[字符词典](#字符词典) +- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集下载)章节。 +- RobustScanner的MindIR导出时的输入Shape均为(1, 3, 48, 160)。 + + +## 字符词典 ### 默认设置 diff --git a/configs/rec/svtr/README.md b/configs/rec/svtr/README.md index 3dd8142f3..d7044c2c3 100644 --- a/configs/rec/svtr/README.md +++ b/configs/rec/svtr/README.md @@ -27,8 +27,6 @@ Dominant scene text recognition models commonly contain two building blocks, a v | 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | - - ## Quick Start ### Preparation @@ -172,7 +170,7 @@ eval: ... ``` -By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80. +By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80. 2. Evaluate on multiple datasets under the same folder @@ -323,19 +321,6 @@ We use a public Chinese text benchmark dataset [Benchmarking-Chinese-Text-Recogn For detailed instruction of data preparation and yaml configuration, please refer to [ch_dataeset](../../../docs/en/datasets/chinese_text_recognition.md). -### Training - -To train with the prepared datsets and config file, please run: - -```shell -mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/svtr/svtr_tiny_ch.yaml -``` - - - -### Training with Custom Datasets -You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md). - ## Performance @@ -343,41 +328,29 @@ You can train models for different languages with your own custom datasets. Load Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode. -*coming soon* - -Experiments are tested on ascend 910 with mindspore 2.3.1 graph mode. - | **model name** | **cards** | **batch size** | **languages** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** | -| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: | :-------: | :-----: | :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 65.93% | 69.64% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) | +| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: |:---------:|:-------:| :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 66.19% | 69.66% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) | ### Specific Purpose Models Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode. -*coming soon* - -Experiments are tested on ascend 910 with mindspore 2.3.1 graph mode. - -| **model name** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | -| :------------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :-------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| SVTR-Tiny | 4 | 512 | O2 | 226.86 s | 49.38 | 4560 | 90.23% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3-86ece8c8.mindir) | -| SVTR-Tiny-8P | 8 | 512 | O2 | 230.74 s | 55.16 | 9840 | 90.32% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) | - +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:----------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| SVTR-Tiny-8P | Tiny | MJ+ST | 60.24 | 8 | 512 | O2 | 230.39 s | 685.68 | 5973.61 | 90.29% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) | Detailed accuracy results for each benchmark dataset: - -| **model name** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | -| :------------: | :----------: | :----------: | :----------: | :-----------: | :-----------: | :-----------: | :-------------: | :-----: | :------: | :--------: | :---------: | -| SVTR-Tiny | 95.70% | 95.50% | 95.33% | 93.99% | 83.60% | 79.83% | 94.70% | 91.96% | 85.58% | 86.11% | 90.23% | -| SVTR-Tiny-8P | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.33% | 90.57% | 86.20% | 86.46% | 90.32% | +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| SVTR-Tiny-8P | Tiny | 1 | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.30% | 90.42% | 86.05% | 86.46% | 90.29% | ### Notes - To reproduce the result on other contexts, please ensure the global batch size is the same. -- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). -- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. +- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary). +- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-usage) section. - The input Shapes of MindIR of RARE is (1, 3, 64, 256). diff --git a/configs/rec/svtr/README_CN.md b/configs/rec/svtr/README_CN.md index cea591b23..73f9fe57e 100644 --- a/configs/rec/svtr/README_CN.md +++ b/configs/rec/svtr/README_CN.md @@ -169,7 +169,7 @@ eval: ... ``` -通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 +通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。 2. 对同一文件夹下的多个数据集进行评估 @@ -209,7 +209,7 @@ eval: ... ``` -#### 3.1.4 检查配置文件 +#### 检查配置文件 除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`, `eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下: @@ -320,62 +320,36 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置, 详细的数据准备和config文件配置方式, 请参考 [中文识别数据集准备](../../../docs/zh/datasets/chinese_text_recognition.md) -### 模型训练验证 - -准备好数据集和配置文件后,执行以下命令开启多卡训练 - -```shell -mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/svtr/svtr_tiny_ch.yaml -``` - - - -### 使用自定义数据集进行训练 -您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/zh/tutorials/training_recognition_custom_dataset_CN.md)。 - ## 性能表现 ### 通用泛化中文模型 在采用图模式的ascend 910*上实验结果,mindspore版本为2.3.1 -*即将到来* - -在采用图模式的ascend 910上实验结果,mindspore版本为2.3.1 - - | **model name** | **cards** | **batch size** | **languages** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** | -| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: | :-------: | :-----: | :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 65.93% | 69.64% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) | +| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: |:---------:|:-------:| :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 66.19% | 69.66% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) | + ### 细分领域模型 在采用图模式的ascend 910*上实验结果,mindspore版本为2.3.1 -*即将到来* - -在采用图模式的ascend 910上实验结果,mindspore版本为2.3.1 - -| **model name** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | -| :------------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :-------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| SVTR-Tiny | 4 | 512 | O2 | 226.86 s | 49.38 | 4560 | 90.23% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3-86ece8c8.mindir) | -| SVTR-Tiny-8P | 8 | 512 | O2 | 230.74 s | 55.16 | 9840 | 90.32% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) | - +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:----------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| SVTR-Tiny-8P | Tiny | MJ+ST | 60.24 | 8 | 512 | O2 | 230.39 s | 685.68 | 5973.61 | 90.29% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) | 在各个基准数据集上的准确率 -| **model name** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | -| :------------: | :----------: | :----------: | :----------: | :-----------: | :-----------: | :-----------: | :-------------: | :-----: | :------: | :--------: | :---------: | -| SVTR-Tiny | 95.70% | 95.50% | 95.33% | 93.99% | 83.60% | 79.83% | 94.70% | 91.96% | 85.58% | 86.11% | 90.23% | -| SVTR-Tiny-8P | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.33% | 90.57% | 86.20% | 86.46% | 90.32% | - +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| SVTR-Tiny-8P | Tiny | 1 | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.30% | 90.42% | 86.05% | 86.46% | 90.29% | **注意:** -- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x4-MS1.10-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore1.10版本进行训练。 - 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 -- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典) -- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。 +- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[字符词典](#字符词典) +- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集准备)章节。 - SVTR的MindIR导出时的输入Shape均为(1, 3, 64, 256)。 ## 参考文献 diff --git a/configs/rec/visionlan/README.md b/configs/rec/visionlan/README.md index f49728fa6..8ad076587 100644 --- a/configs/rec/visionlan/README.md +++ b/configs/rec/visionlan/README.md @@ -6,9 +6,9 @@ English | [中文](README_CN.md) > VisionLAN: [From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network](https://arxiv.org/abs/2108.09661) -## 1. Introduction +## Introduction -### 1.1 VisionLAN +### VisionLAN Visual Language Modeling Network (VisionLAN) [1] is a text recognion model that learns the visual and linguistic information simultaneously via **character-wise occluded feature maps** in the training stage. This model does not require an extra language model to extract linguistic information, since the visual and linguistic information can be learned as a union. @@ -32,57 +32,20 @@ As shown above, the training pipeline of VisionLAN consists of three modules: While in the test stage, MLM is not used. Only the backbone and VRM are used for prediction. -## 2. Results - - -### 2.1 Accuracy - -According to our experiments, the evaluation results on ten public benchmark datasets is as follow: - -
- -| **Model** | **Context** | **Backbone**| **Train Dataset** | **Model Params**|**Avg Accuracy** | **Train Time** | **Per Step Time** | **FPS** | **Recipe** | **Download** | -| :-----: | :-----------: | :--------------: | :----------: | :--------: | :--------: |:----------: |:--------: | :--------: |:--------: |:----------: | -| visionlan | D910x4-MS2.0-G | resnet45 | MJ+ST| 42.2M | 90.61% | 7718 s/epoch | 417 ms/step | 1,840 img/s | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml)| [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)| - -
- -
-
- Detailed accuracy results for ten benchmark datasets - - | **Model** | **Context** | **IC03_860**| **IC03_867**| **IC13_857**|**IC13_1015** | **IC15_1811** |**IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |:------: |:------: | :------: |:------: | - | visionlan | D910x4-MS2.0-G | 96.16% | 95.16% | 95.92%| 94.19% | 84.04% | 77.46% | 95.53% | 92.27% | 85.74% |89.58% | 90.61% | - -
+## Requirements -
- -**Notes:** - -- Context: Training context denoted as `{device}x{pieces}-{MS version}-{MS mode}`. Mindspore mode can be either `G` (graph mode) or `F` (pynative mode). For example, `D910x4-MS2.0-G` denotes training on 4 pieces of 910 NPUs using graph mode based on MindSpore version 2.0.0. -- Train datasets: MJ+ST stands for the combination of two synthetic datasets, SynthText(800k) and MJSynth. -- To reproduce the result on other contexts, please ensure the global batch size is the same. -- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [3.2 Dataset preparation](#32-dataset-preparation) section. -- The input Shape of MindIR of VisionLAN is (1, 3, 64, 256). +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -## 3. Quick Start +## Quick Start -### 3.1 Installation +### Installation Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR. -### 3.2 Dataset preparation +### Dataset preparation **Training sets** @@ -156,7 +119,7 @@ datasets └── SynText ``` -### 3.3 Update yaml config file +### Update yaml config file If the datasets are placed under `./datasets`, there is no need to change the `train.dataset.dataset_root` in the yaml configuration file `configs/rec/visionlan/visionlan_L*.yaml`. @@ -205,7 +168,7 @@ common: - As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size. -### 3.4 Training +### Training The training stages include Language-free (LF) and Language-aware (LA) process, and in total three steps for training: @@ -226,7 +189,7 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/visio The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir` in yaml config file. The default directory is `./tmp_visionlan`. -### 3.5 Test +### Test After all three steps training, change the `system.distribute` to `False` in `configs/rec/visionlan/visionlan_resnet45_LA.yaml` before testing. @@ -254,10 +217,52 @@ training_step="LA" python tools/benchmarking/multi_dataset_eval.py --config $yaml_file --opt eval.dataset.data_dir="test" eval.ckpt_load_path="./tmp_visionlan/${training_step}/${model_name}.ckpt" ``` +## Results + + +### Accuracy + +According to our experiments, the evaluation results on ten public benchmark datasets is as follow: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:|:-----------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| visionlan | Resnet45 | MJ+ST | 42.22 | 4 | 128 | O2 | 191.52 s | 280.29 | 1826.63 | 90.62% | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml) | [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)| + +
+ +
+
+ Detailed accuracy results for ten benchmark datasets + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| visionlan | Resnet45 | 1 | 96.16% | 95.16% | 95.92% | 94.19% | 84.04% | 77.47% | 95.53% | 92.27% | 85.89% | 89.58% | 90.62% | + +
+ +
+ +**Notes:** + +- Train datasets: MJ+ST stands for the combination of two synthetic datasets, SynthText(800k) and MJSynth. +- To reproduce the result on other contexts, please ensure the global batch size is the same. +- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset preparation](#dataset-preparation) section. +- The input Shape of MindIR of VisionLAN is (1, 3, 64, 256). + -## 4. Inference +## Inference -### 4.1 Prepare MINDIR file +### Prepare MINDIR file Please download the [MINDIR](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir) file from the table above, or you can use `tools/export.py` to manually convert any checkpoint file into a MINDIR file: ```bash @@ -269,7 +274,7 @@ This command will save a `visionlan_resnet45.mindir` under the current working d > Learn more about [Model Export](https://github.com/mindspore-lab/mindocr/blob/main/docs/en/inference/convert_tutorial.md#11-model-export). -### 4.2 Mindspore Lite Converter Tool +### Mindspore Lite Converter Tool If you haven't downloaded MindSpore Lite, please download it via this [link](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html). More details on how to use MindSpore Lite in Linux Environment refer to [this document](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/converter_tool.html#linux-environment-usage-instructions). @@ -290,7 +295,7 @@ Running this command will save a `visionlan_resnet45_lite.mindir` under the curr > Learn more about [Model Conversion](https://github.com/mindspore-lab/mindocr/blob/main/docs/en/inference/convert_tutorial.md#12-model-conversion). -### 4.3 Inference on A Folder of Images +### Inference on A Folder of Images Taking `SVT` test set as an example, the data structure under the dataset folder is: @@ -335,7 +340,7 @@ The evaluation results are shown below: ``` -## 5. References +## References [1] Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang: From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network. ICCV 2021: 14174-14183 diff --git a/configs/rec/visionlan/README_CN.md b/configs/rec/visionlan/README_CN.md index 588568826..77b740d43 100644 --- a/configs/rec/visionlan/README_CN.md +++ b/configs/rec/visionlan/README_CN.md @@ -6,9 +6,9 @@ > VisionLAN: [From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network](https://arxiv.org/abs/2108.09661) -## 1. 简介 +## 简介 -### 1.1 VisionLAN +### VisionLAN 视觉语言建模网络(VisionLAN)[1]是一种文本识别模型,它通过在训练阶段使用逐字符遮挡的特征图来同时学习视觉和语言信息。这种模型不需要额外的语言模型来提取语言信息,因为视觉和语言信息可以作为一个整体来学习。

@@ -26,45 +26,21 @@ 但在测试阶段,MLM不被使用。只有骨干网络和VRM被用于预测。 -## 2.精度结果 +## 配套版本 -根据我们实验结果,在10个公开数据集上的评估结果如下: +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:----------:|:--------------:|:-------------:|:-------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 | -

-| **Model** | **Context** | **Backbone**| **Train Dataset** | **Model Params**|**Avg Accuracy** | **Train Time** | **Per Step Time** | **FPS** | **Recipe** | **Download** | -| :-----: | :-----------: | :--------------: | :----------: | :--------: | :--------: |:----------: |:--------: | :--------: |:--------: |:----------: | -| visionlan | D910x4-MS2.0-G | resnet45 | MJ+ST| 42.2M | 90.61% | 7718s/epoch | 417 ms/step | 1,840 img/s | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml)| [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)| -
+## 快速入门 -
-
- Detailed accuracy results for ten benchmark datasets - - | **Model** | **Context** | **IC03_860**| **IC03_867**| **IC13_857**|**IC13_1015** | **IC15_1811** |**IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |:------: |:------: | :------: |:------: | - | visionlan | D910x4-MS2.0-G | 96.16% | 95.16% | 95.92%| 94.19% | 84.04% | 77.46% | 95.53% | 92.27% | 85.74% |89.58% | 90.61% | - -
- -
- -**注** - -- 训练环境表示为`{device}x{pieces}-{MS版本}-{MS模式}`。MindSpore模式可以是`G`(Graph模式)或`F`(Pynative模式)。例如,`D910x4-MS2.0-G`表示使用MindSpore版本2.0.0在4块910 NPUs上使用图模式进行训练。 -- 训练数据集:`MJ+ST`代表两个合成数据集SynthText(800k)和MJSynth的组合。 -- 要在其他训练环境中重现结果,请确保全局批量大小相同。 -- 这些模型是从头开始训练的,没有任何预训练。有关训练和评估的更多数据集详细信息,请参阅[3.2数据集准备](#32数据集准备)部分。 -- VisionLAN的MindIR导出时的输入Shape均为(1, 3, 64, 256)。 - -## 3.快速入门 - -### 3.1安装 +### 安装 请参考[MindOCR中的安装说明](https://github.com/mindspore-lab/mindocr#installation)。 -### 3.2数据集准备 +### 数据集准备 **训练集** @@ -137,7 +113,7 @@ datasets └── SynText ``` -### 3.3 更新yaml配置文件 +### 更新yaml配置文件 如果数据集放置在`./datasets`目录下,则无需更改yaml配置文件`configs/rec/visionlan/visionlan_L*.yaml`中的`train.dataset.dataset_root`。 否则,请相应地更改以下字段: @@ -185,7 +161,7 @@ common: **注意:** - 由于全局批大小 (batch_size x num_devices) 是对结果复现很重要,因此当GPU/NPU卡数发生变化时,调整batch_size以保持全局批大小不变,或将学习率线性调整为新的全局批大小。 -### 3.4 训练 +### 训练 训练阶段包括无语言(LF)和有语言(LA)过程,总共有三个训练步骤: @@ -206,7 +182,7 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/visio 训练结果(包括checkpoints、每个阶段的性能和loss曲线)将保存在yaml配置文件中由参数`ckpt_save_dir`解析的目录中。默认目录为`./tmp_visionlan`。 -### 3.5 测试 +### 测试 在完成上述三个训练步骤以后, 用户需要在测试前,将 `configs/rec/visionlan/visionlan_resnet45_LA.yaml` 文件中的`system.distribute`改为 `False`。 @@ -235,10 +211,40 @@ training_step="LA" python tools/benchmarking/multi_dataset_eval.py --config $yaml_file --opt eval.dataset.data_dir="test" eval.ckpt_load_path="./tmp_visionlan/${training_step}/${model_name}.ckpt" ``` +## 精度结果 + +根据我们实验结果,在10个公开数据集上的评估结果如下: + +
+ +| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** | +|:--------------:|:------------:|:-----------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| visionlan | Resnet45 | MJ+ST | 42.22 | 4 | 128 | O2 | 191.52 s | 280.29 | 1826.63 | 90.62% | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml) | [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)| + +
+ +
+
+ Detailed accuracy results for ten benchmark datasets + +| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** | +|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:| +| visionlan | Resnet45 | 1 | 96.16% | 95.16% | 95.92% | 94.19% | 84.04% | 77.47% | 95.53% | 92.27% | 85.89% | 89.58% | 90.62% | + +
+ +
+ +**注** + +- 训练数据集:`MJ+ST`代表两个合成数据集SynthText(800k)和MJSynth的组合。 +- 要在其他训练环境中重现结果,请确保全局批量大小相同。 +- 这些模型是从头开始训练的,没有任何预训练。有关训练和评估的更多数据集详细信息,请参阅[数据集准备](#数据集准备)部分。 +- VisionLAN的MindIR导出时的输入Shape均为(1, 3, 64, 256)。 -## 4. 推理 +## 推理 -### 4.1 准备 MINDIR 文件 +### 准备 MINDIR 文件 请从上面的表格中中下载[MINDIR](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)文件,或者您可以使用`tools/export.py`将任何检查点文件手动转换为 MINDIR 文件: ```bash @@ -248,7 +254,7 @@ python tools/export.py --model_name_or_config visionlan_resnet45 --data_shape 64 此命令将在当前工作目录下保存一个`visionlan_resnet45.mindir`文件。 -### 4.2 Mindspore Lite Converter Tool +### Mindspore Lite Converter Tool 如果您尚未下载 MindSpore Lite,请通过此[链接](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html)进行下载。有关如何在 Linux 环境中使用 MindSpore Lite 的更多详细信息,请参阅[此文档](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/converter_tool.html#linux-environment-usage-instructions)。 @@ -267,7 +273,7 @@ converter_lite \ 运行此命令将在当前工作目录下保存一个`visionlan_resnet45_lite.mindir`文件。这是我们可以在`Ascend310`或`310P`平台上进行推理的`MindSpore Lite MindIR`文件。您还可以通过更改`--outputFile`参数来定义不同的文件名。 -### 4.3 对图像文件夹进行推理 +### 对图像文件夹进行推理 以`SVT`测试集为例,数据集文件夹下的数据结构如下: ```text @@ -308,7 +314,7 @@ python deploy/eval_utils/eval_rec.py \ {'acc': 0.9227202534675598, 'norm_edit_distance': 0.9720136523246765} ``` -## 5. 引用文献 +## 引用文献 [1] Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang: From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network. ICCV 2021: 14174-14183 diff --git a/configs/rec/visionlan/visionlan_resnet45_LA.yaml b/configs/rec/visionlan/visionlan_resnet45_LA.yaml index e2958d6cf..dfdb4bb3f 100644 --- a/configs/rec/visionlan/visionlan_resnet45_LA.yaml +++ b/configs/rec/visionlan/visionlan_resnet45_LA.yaml @@ -1,11 +1,11 @@ system: mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore distribute: True - amp_level: 'O2' + amp_level: 'O0' seed: 42 log_interval: 200 val_while_train: True - drop_overflow_update: False + drop_overflow_update: True common: character_dict_path: &character_dict_path diff --git a/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml b/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml index da8c2aab4..d6ffb944d 100644 --- a/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml +++ b/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml @@ -1,11 +1,11 @@ system: mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore distribute: True - amp_level: 'O2' + amp_level: 'O0' seed: 42 log_interval: 200 val_while_train: True - drop_overflow_update: False + drop_overflow_update: True common: character_dict_path: &character_dict_path diff --git a/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml b/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml index c3acecb73..9435b1bc8 100644 --- a/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml +++ b/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml @@ -1,11 +1,11 @@ system: mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore distribute: True - amp_level: 'O2' + amp_level: 'O0' seed: 42 log_interval: 200 val_while_train: True - drop_overflow_update: False + drop_overflow_update: True common: character_dict_path: &character_dict_path diff --git a/mindocr/models/utils/attention_cells.py b/mindocr/models/utils/attention_cells.py index f1aa28637..f285cbaf1 100644 --- a/mindocr/models/utils/attention_cells.py +++ b/mindocr/models/utils/attention_cells.py @@ -27,6 +27,23 @@ def __init__( self.matmul = ops.BatchMatMul() + self.min_fp16 = ms.tensor(np.finfo(np.float16).min, dtype=ms.float16) + self.min_fp32 = ms.tensor(np.finfo(np.float32).min, dtype=ms.float32) + self.min_fp64 = ms.tensor(np.finfo(np.float64).min, dtype=ms.float64) + self.min_bf16 = ms.tensor(float.fromhex("-0x1.fe00000000000p+127"), dtype=ms.bfloat16) + + def dtype_to_min(self, dtype): + if dtype == ms.float16: + return self.min_fp16 + if dtype == ms.float32: + return self.min_fp32 + if dtype == ms.float64: + return self.min_fp64 + if dtype == ms.bfloat16: + return self.min_bf16 + else: + raise ValueError(f"Only support get minimum value of (float16, ), but got {dtype}") + def dot_product_attention( self, query: Tensor, key: Tensor, value: Tensor, mask: Optional[Tensor] = None ) -> Tuple[Tensor, Tensor]: @@ -37,7 +54,7 @@ def dot_product_attention( if mask is not None: score = ops.masked_fill( - score, mask == 0, ms.Tensor(-np.inf, score.dtype) + score, mask == 0, self.dtype_to_min(score.dtype) ) # score (N, h, seq_len, seq_len) p_attn = ops.softmax(score, axis=-1)