+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:---------------:|:------------:|:-------------------:|:-------------:|:---------:| :------------: | :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:---------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| Master-Resnet31 | Resnet31 | MJ+ST+SyAythAdd | 68.23 | 4 | 16 | O2 | 194.99 s | 642.164 | 3189.22 | 90.34% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/master/master_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31-e7bfbc97.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31_ascend-e7bfbc97-b724ed55.mindir) |
+
+
+
+
+## 配套版本
-| **模型** | **环境配置** | **平均准确率** | **训练时间** | **FPS** | **配置文件** | **模型权重下载** |
-| :-----: | :-----: | :-----: | :-----: | :-----: |:--------: | :-----: |
-| Master-Resnet31 | D910x4-MS1.10-G | 90.37% | 6356 s/epoch | 2741 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/master/master_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31-e7bfbc97.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31_ascend-e7bfbc97-b724ed55.mindir) |
-
+| mindspore | ascend driver | firmware | cann toolkit/kernel |
+|:----------:|:--------------:|:-------------:|:-------------------:|
+| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-提出。请使用以下命令转换成LMDB格式
@@ -102,7 +71,7 @@ python tools/dataset_converters/convert.py \
并将转换完成的`SynthAdd`文件夹摆在`/training`里面.
-#### 3.1.3 数据集使用
+#### 数据集使用
最终数据文件夹结构如下:
@@ -218,7 +187,7 @@ eval:
...
```
-通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
+通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
2. 对同一文件夹下的多个数据集进行评估
@@ -258,7 +227,7 @@ eval:
...
```
-#### 3.1.4 检查配置文件
+#### 检查配置文件
除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`,
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下:
@@ -269,7 +238,7 @@ system:
amp_level_infer: "O2"
seed: 42
val_while_train: True # 边训练边验证
- drop_overflow_update: False
+ drop_overflow_update: True
common:
...
batch_size: &batch_size 512 # 训练批大小
@@ -300,7 +269,7 @@ eval:
- 由于全局批大小 (batch_size x num_devices) 是对结果复现很重要,因此当GPU/NPU卡数发生变化时,调整`batch_size`以保持全局批大小不变,或根据新的全局批大小线性调整学习率。
-### 3.2 模型训练
+### 模型训练
* 分布式训练
@@ -324,7 +293,7 @@ python tools/train.py --config configs/rec/master/master_resnet31.yaml
训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`。
-### 3.3 模型评估
+### 模型评估
若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行:
@@ -332,7 +301,48 @@ python tools/train.py --config configs/rec/master/master_resnet31.yaml
python tools/eval.py --config configs/rec/master/master_resnet31.yaml
```
-## 4. 字符词典
+
+## 评估结果
+
+
+### 精度结果
+
+根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下:
+
+
+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:---------------:|:------------:|:-------------------:|:-------------:|:---------:| :------------: | :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:---------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| Master-Resnet31 | Resnet31 | MJ+ST+SyAythAdd | 68.23 | 4 | 16 | O2 | 194.99 s | 642.164 | 3189.22 | 90.34% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/master/master_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31-e7bfbc97.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/master/master_resnet31_ascend-e7bfbc97-b724ed55.mindir) |
+
+
+
+
+
+ 在各个基准数据集上的准确率
+
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:| :----------: | :-------: |:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+|Master-ResNet31 | ResNet31 | 1 | 93.72% | 95.16% | 96.85% | 95.17% | 81.94% | 78.48% | 95.57% | 90.88% | 84.19% | 89.58% | 90.34% |
+
+
+
+
+**注意:**
+- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
+- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#环境及数据准备)章节。
+- Master的MindIR导出时的输入Shape均为(1, 3, 48, 160)。
+
+
+## 字符词典
### 默认设置
@@ -360,11 +370,11 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置,
- 请记住检查配置文件中的 `dataset->transform_pipeline->RecMasterLabelEncode->lower` 参数的值。如果词典中有大小写字母而且想区分大小写的话,请将其设置为 False。
-## 5. MindSpore Lite 推理
+## MindSpore Lite 推理
请参考[MindOCR 推理](../../../docs/cn/inference/inference_tutorial.md)教程,基于MindSpore Lite在Ascend 310上进行模型的推理,包括以下步骤:
-**1. 模型导出**
+**模型导出**
请先[下载](#2-评估结果)已导出的MindIR文件,或者参考[模型导出](../../README.md)教程,使用以下命令将训练完成的ckpt导出为MindIR文件:
@@ -376,15 +386,15 @@ python tools/export.py --model_name_or_config configs/rec/master/master_resnet31
其中,`data_shape`是导出MindIR时的模型输入Shape的height和width,下载链接中MindIR对应的shape值见[注释](#2-评估结果)。
-**2. 环境搭建**
+**环境搭建**
请参考[环境安装](../../../docs/cn/inference/environment.md#2-mindspore-lite推理)教程,配置MindSpore Lite推理运行环境。
-**3. 模型转换**
+**模型转换**
请参考[模型转换](../../../docs/cn/inference/convert_tutorial.md#1-mindocr模型)教程,使用`converter_lite`工具对MindIR模型进行离线转换。
-**4. 执行推理**
+**执行推理**
假设在模型转换后得到output.mindir文件,在`deploy/py_infer`目录下使用以下命令进行推理:
diff --git a/configs/rec/master/master_resnet31.yaml b/configs/rec/master/master_resnet31.yaml
index df0c9bb2c..2618e2081 100644
--- a/configs/rec/master/master_resnet31.yaml
+++ b/configs/rec/master/master_resnet31.yaml
@@ -6,7 +6,7 @@ system:
seed: 42
log_interval: 100
val_while_train: True
- drop_overflow_update: False
+ drop_overflow_update: True
ckpt_max_keep: 3
common:
diff --git a/configs/rec/rare/README.md b/configs/rec/rare/README.md
index 5ca8f206f..c5148aaff 100644
--- a/configs/rec/rare/README.md
+++ b/configs/rec/rare/README.md
@@ -5,7 +5,7 @@ English | [中文](https://github.com/mindspore-lab/mindocr/blob/main/configs/re
> [Robust Scene Text Recognition with Automatic Rectification](https://arxiv.org/abs/1603.03915)
-## 1. Introduction
+## Introduction
Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. The paper proposes RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. It shows that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model. [1]
@@ -18,52 +18,20 @@ Recognizing text in natural images is a challenging task with many unsolved prob
Figure 1. Architecture of SRN in RARE [1]
-## 2. Results
-
-
-### Accuracy
-
-According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
-
-
+## Requirements
-| **Model** | **Context** | **Backbone** | **Transform Module** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** |
-| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :--------: |:-----: |
-| RARE | D910x4-MS1.10-G | ResNet34_vd | None | 85.19% | 3166 s/epoch | 4561 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir) |
-
+| mindspore | ascend driver | firmware | cann toolkit/kernel |
+|:----------:|:--------------:|:-------------:|:-------------------:|
+| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-
-
- Detailed accuracy results for each benchmark dataset
- | **Model** | **Backbone** | **Transform Module** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
- | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
- | RARE | ResNet34_vd | None | 95.12% | 94.58% | 94.28% | 92.71% | 75.31% | 69.52% | 88.17% | 87.33% | 78.91% | 76.04% | 85.19% |
-
-
+## Quick Start
+### Preparation
-**Notes:**
-- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x4-MS1.10-G is for training on 4 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.10.
-- To reproduce the result on other contexts, please ensure the global batch size is the same.
-- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary).
-- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section.
-- The input Shapes of MindIR of RARE is (1, 3, 32, 100) and it is for Ascend only.
-
-## 3. Quick Start
-### 3.1 Preparation
-
-#### 3.1.1 Installation
+#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.
-#### 3.1.2 Dataset Download
+#### Dataset Download
Please download lmdb dataset for traininig and evaluation from [here](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0) (ref: [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)). There're several zip files:
- `data_lmdb_release.zip` contains the **entire** datasets including training data, validation data and evaluation data.
- `training/` contains two datasets: [MJSynth (MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/) and [SynthText (ST)](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c)
@@ -110,7 +78,7 @@ data_lmdb_release/
└── lock.mdb
```
-#### 3.1.3 Dataset Usage
+#### Dataset Usage
Here we used the datasets under `training/` folders for training, and the union dataset `validation/` for validation. After training, we used the datasets under `evaluation/` to evaluate model accuracy.
@@ -225,7 +193,7 @@ eval:
...
```
-#### 3.1.4 Check YAML Config Files
+#### Check YAML Config Files
Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`,
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`. Explanations of these important args:
@@ -235,7 +203,7 @@ system:
amp_level: 'O2'
seed: 42
val_while_train: True # Validate while training
- drop_overflow_update: False
+ drop_overflow_update: True
common:
...
batch_size: &batch_size 512 # Batch size for training
@@ -266,7 +234,7 @@ eval:
- As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size.
-### 3.2 Model Training
+### Model Training
* Distributed Training
@@ -290,15 +258,54 @@ python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`.
-### 3.3 Model Evaluation
+### Model Evaluation
To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run:
```shell
python tools/eval.py --config configs/rec/rare/rare_resnet34.yaml
```
+## Results
+
-## 4. Character Dictionary
+### Accuracy
+
+According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
+
+
+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:| :---------------: |:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| RARE | ResNet34_vd | MJ+ST | 25.31 | 4 | 512 | O2 | 252.62 s | 180.26 | 11361.43 | 85.24% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir)|
+
+
+
+
+
+ Detailed accuracy results for each benchmark dataset
+
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:--------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| RARE | ResNet34_vd | 1 | 95.12% | 94.57% | 94.40% | 92.81% | 75.43% | 69.62% | 88.17% | 87.33% | 78.91% | 76.04% | 85.24% |
+
+
+
+
+**Notes:**
+- To reproduce the result on other contexts, please ensure the global batch size is the same.
+- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary).
+- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-usage) section.
+- The input Shapes of MindIR of RARE is (1, 3, 32, 100) and it is for Ascend only.
+
+## Character Dictionary
### Default Setting
@@ -325,7 +332,7 @@ To use a specific dictionary, set the parameter `character_dict_path` to the pat
- Remember to check the value of `dataset->transform_pipeline->RecAttnLabelEncode->lower` in the configuration yaml. Set it to False if you prefer case-sensitive encoding.
-## 5. Chinese Text Recognition Model Training
+## Chinese Text Recognition Model Training
Currently, this model supports multilingual recognition and provides pre-trained models for different languages. Details are as follows:
@@ -335,35 +342,24 @@ We use a public Chinese text benchmark dataset [Benchmarking-Chinese-Text-Recogn
For detailed instruction of data preparation and yaml configuration, please refer to [ch_dataset](../../../docs/en/datasets/chinese_text_recognition.md).
-### Training
-
-To train with the prepared datsets and config file, please run:
-
-```shell
-mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/rare_resnet34_ch.yaml
-```
-
### Results and Pretrained Weights
After training, evaluation results on the benchmark test set are as follows, where we also provide the model config and pretrained weights.
-| **Model** | **Language** | **Backbone** | **Transform Module** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** |
-| :-----: | :-----: | :--------: | :------------: | :--------: | :--------: | :--------: | :--------: | :--------: |:---------: | :-----------: |
-| RARE | Chinese | ResNet34_vd | None | 62.15% | 67.05% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) |
+| **Model** | **Language** | **Backbone** | **Transform Module** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** |
+|:---------:|:------------:|:------------:|:--------------------:|:---------:|:-------:|:------------:|:------------:|:-------:|:------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| RARE | Chinese | ResNet34_vd | None | 62.39% | 67.02% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) |
- The input Shapes of MindIR of RARE is (1, 3, 32, 320) and it is for Ascend only.
-### Training with Custom Datasets
-You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md).
-
-## 6. MindSpore Lite Inference
+## MindSpore Lite Inference
To inference with MindSpot Lite on Ascend 310, please refer to the tutorial [MindOCR Inference](../../../docs/en/inference/inference_tutorial.md). In short, the whole process consists of the following steps:
-**1. Model Export**
+**Model Export**
Please [download](#2-results) the exported MindIR file first, or refer to the [Model Export](../../README.md) tutorial and use the following command to export the trained ckpt model to MindIR file:
@@ -376,16 +372,16 @@ python tools/export.py --model_name_or_config configs/rec/rare/rare_resnet34 --d
The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table.
-**2. Environment Installation**
+**Environment Installation**
Please refer to [Environment Installation](../../../docs/en/inference/environment.md#2-mindspore-lite-inference) tutorial to configure the MindSpore Lite inference environment.
-**3. Model Conversion**
+**Model Conversion**
Please refer to [Model Conversion](../../../docs/en/inference/convert_tutorial.md#1-mindocr-models),
and use the `converter_lite` tool for offline conversion of the MindIR file.
-**4. Inference**
+**Inference**
Assuming that you obtain output.mindir after model conversion, go to the `deploy/py_infer` directory, and use the following command for inference:
diff --git a/configs/rec/rare/README_CN.md b/configs/rec/rare/README_CN.md
index 8e7abc006..f97d2d9c4 100644
--- a/configs/rec/rare/README_CN.md
+++ b/configs/rec/rare/README_CN.md
@@ -5,7 +5,7 @@
> [Robust Scene Text Recognition with Automatic Rectification](https://arxiv.org/abs/1603.03915)
-## 1. 模型描述
+## 模型描述
识别自然图像中的文本是一个包含许多未解决问题的挑战性任务。与文档中的文字不同,自然图像中的文字通常具有不规则的形状,这是由透视畸变、曲线字符等因素引起的。该论文提出了RARE(Robust Scene Text Recognition with Automatic Rectification),这是一种对不规则文本具有鲁棒性的识别模型。RARE是一种特别设计的深度神经网络,由空间变换网络(STN)和序列识别网络(SRN)组成。在测试中,图像首先通过预测的Thin-Plate-Spline(TPS)变换进行矫正,成为接下来的SRN可以识别的更加“可读”的图像,SRN通过序列识别方法识别文本。研究表明,该模型能够识别多种类型的不规则文本,包括透视文本和曲线文本。RARE是端到端可训练的,只需要图像和相关的文本标签,这使得训练和部署模型在实际系统中变得更加方便。在几个基准数据集上,该模型达到了SOTA性能,充分证明了所提出模型的有效性。 [1]
@@ -19,52 +19,20 @@
图1. RARE中的SRN结构 [1]
-## 2. 评估结果
-
-
-### 精度结果
-
-根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下:
-
-
+## 配套版本
-| **模型** | **环境配置** | **骨干网络** | **空间变换网络** | **平均准确率** | **训练时间** | **FPS** | **配置文件** | **模型权重下载** |
-| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :--------: |:-----: |
-| RARE | D910x4-MS1.10-G | ResNet34_vd | 无 | 85.19% | 3166 s/epoch | 4561 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir) |
-
+| mindspore | ascend driver | firmware | cann toolkit/kernel |
+|:----------:|:--------------:|:-------------:|:-------------------:|
+| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-
-
- 在各个基准数据集上的准确率
- | **模型** | **骨干网络** | **空间变换网络** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **平均准确率** |
- | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
- | RARE | ResNet34_vd | None | 95.12% | 94.58% | 94.28% | 92.71% | 75.31% | 69.52% | 88.17% | 87.33% | 78.91% | 76.04% | 85.19% |
-
-
+## 快速开始
+### 环境及数据准备
-**注意:**
-- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x4-MS1.10-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore1.10版本进行训练。
-- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
-- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典)
-- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。
-- RARE的MindIR导出时的输入Shape均为(1, 3, 32, 100),只能在昇腾卡上使用。
-
-## 3. 快速开始
-### 3.1 环境及数据准备
-
-#### 3.1.1 安装
+#### 安装
环境安装教程请参考MindOCR的 [installation instruction](https://github.com/mindspore-lab/mindocr#installation).
-#### 3.1.2 数据集下载
+#### 数据集下载
LMDB格式的训练及验证数据集可以从[这里](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0) (出处: [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here))下载。连接中的文件包含多个压缩文件,其中:
- `data_lmdb_release.zip` 包含了**完整**的一套数据集,有训练集(training/),验证集(validation/)以及测试集(evaluation)。
- `training.zip` 包括两个数据集,分别是 [MJSynth (MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/) 和 [SynthText (ST)](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c)
@@ -73,7 +41,7 @@ LMDB格式的训练及验证数据集可以从[这里](https://www.dropbox.com/s
- `validation.zip`: 与 data_lmdb_release.zip 中的validation/ 一样。
- `evaluation.zip`: 与 data_lmdb_release.zip 中的evaluation/ 一样。
-#### 3.1.3 数据集使用
+#### 数据集使用
解压文件后,数据文件夹结构如下:
@@ -184,7 +152,7 @@ eval:
...
```
-通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
+通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
2. 对同一文件夹下的多个数据集进行评估
@@ -224,7 +192,7 @@ eval:
...
```
-#### 3.1.4 检查配置文件
+#### 检查配置文件
除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`,
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下:
@@ -234,7 +202,7 @@ system:
amp_level: 'O2'
seed: 42
val_while_train: True # 边训练边验证
- drop_overflow_update: False
+ drop_overflow_update: True
common:
...
batch_size: &batch_size 512 # 训练批大小
@@ -265,7 +233,7 @@ eval:
- 由于全局批大小 (batch_size x num_devices) 是对结果复现很重要,因此当GPU/NPU卡数发生变化时,调整`batch_size`以保持全局批大小不变,或根据新的全局批大小线性调整学习率。
-### 3.2 模型训练
+### 模型训练
* 分布式训练
@@ -289,15 +257,54 @@ python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`。
-### 3.3 模型评估
+### 模型评估
若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行:
```shell
python tools/eval.py --config configs/rec/rare/rare_resnet34.yaml
```
+## 评估结果
+
-## 4. 字符词典
+### 精度结果
+
+根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下:
+
+
+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:| :---------------: |:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| RARE | ResNet34_vd | MJ+ST | 25.31 | 4 | 512 | O2 | 252.62 s | 180.26 | 11361.43 | 85.24% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34-309dc63e.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ascend-309dc63e-b96c2a4b.mindir)|
+
+
+
+
+
+ 在各个基准数据集上的准确率
+
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:--------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| RARE | ResNet34_vd | 1 | 95.12% | 94.57% | 94.40% | 92.81% | 75.43% | 69.62% | 88.17% | 87.33% | 78.91% | 76.04% | 85.24% |
+
+
+
+
+**注意:**
+- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
+- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[字符词典](#字符词典)
+- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集下载)章节。
+- RARE的MindIR导出时的输入Shape均为(1, 3, 32, 100),只能在昇腾卡上使用。
+
+## 字符词典
### 默认设置
@@ -324,7 +331,7 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置,
- 您可以通过将配置文件中的参数 `use_space_char` 设置为 True 来包含空格字符。
- 请记住检查配置文件中的 `dataset->transform_pipeline->RecAttnLabelEncode->lower` 参数的值。如果词典中有大小写字母而且想区分大小写的话,请将其设置为 False。
-## 5. 中文识别模型训练
+## 中文识别模型训练
目前,RARE模型支持多语种识别和提供中英预训练模型。详细内容如下
@@ -334,12 +341,6 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置,
详细的数据准备和config文件配置方式, 请参考 [中文识别数据集准备](../../../docs/cn/datasets/chinese_text_recognition.md)
-### 模型训练验证
-
-准备好数据集和配置文件后,执行以下命令开启多卡训练
-```shell
-mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/rare_resnet34_ch.yaml
-```
### 预训练模型数据集介绍
不同语种的预训练模型采用不同数据集作为预训练,数据来源、训练方式和评估方式可参考 **数据说明**。
@@ -353,9 +354,11 @@ mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/
-| **模型** | **语种** | **骨干网络** | **空间变换网络** | **街景类** | **网页类** | **文档类** | **训练时间** | **FPS** | **配置文件** | **模型权重下载** |
-| :-----: | :-----: | :--------: | :------------: | :--------: | :--------: | :--------: |:--------: | :--------: |:--------: | :--------: |
-| RARE | 中文 | ResNet34_vd | 无 | 62.15% | 67.05% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) |
+| **Model** | **Language** | **Backbone** | **Transform Module** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** |
+|:---------:|:------------:|:------------:|:--------------------:|:---------:|:-------:|:------------:|:------------:|:-------:|:------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| RARE | Chinese | ResNet34_vd | None | 62.39% | 67.02% | 97.60% | 414 s/epoch | 2160 | [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-5f3023e2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch_ascend-5f3023e2-11f0d554.mindir) |
+
+
- RARE的MindIR导出时的输入Shape均为(1, 3, 32, 320),只能在昇腾卡上使用。
@@ -364,11 +367,11 @@ mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/
您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/cn/tutorials/training_recognition_custom_dataset_CN.md)。
-## 6. MindSpore Lite 推理
+## MindSpore Lite 推理
请参考[MindOCR 推理](../../../docs/cn/inference/inference_tutorial.md)教程,基于MindSpore Lite在Ascend 310上进行模型的推理,包括以下步骤:
-**1. 模型导出**
+**模型导出**
请先[下载](#2-评估结果)已导出的MindIR文件,或者参考[模型导出](../../README.md)教程,使用以下命令将训练完成的ckpt导出为MindIR文件:
@@ -380,15 +383,15 @@ python tools/export.py --model_name_or_config configs/rec/rare/rare_resnet34.yam
其中,`data_shape`是导出MindIR时的模型输入Shape的height和width,下载链接中MindIR对应的shape值见[注释](#2-评估结果)。
-**2. 环境搭建**
+**环境搭建**
请参考[环境安装](../../../docs/cn/inference/environment.md#2-mindspore-lite推理)教程,配置MindSpore Lite推理运行环境。
-**3. 模型转换**
+**模型转换**
请参考[模型转换](../../../docs/cn/inference/convert_tutorial.md#1-mindocr模型)教程,使用`converter_lite`工具对MindIR模型进行离线转换。
-**4. 执行推理**
+**执行推理**
假设在模型转换后得到output.mindir文件,在`deploy/py_infer`目录下使用以下命令进行推理:
diff --git a/configs/rec/rare/rare_resnet34.yaml b/configs/rec/rare/rare_resnet34.yaml
index d910b7c21..4f4a8320a 100644
--- a/configs/rec/rare/rare_resnet34.yaml
+++ b/configs/rec/rare/rare_resnet34.yaml
@@ -5,7 +5,7 @@ system:
seed: 42
log_interval: 100
val_while_train: True
- drop_overflow_update: False
+ drop_overflow_update: True
common:
character_dict_path: &character_dict_path
diff --git a/configs/rec/robustscanner/README.md b/configs/rec/robustscanner/README.md
index 2bbb8e3d7..79079c0c7 100644
--- a/configs/rec/robustscanner/README.md
+++ b/configs/rec/robustscanner/README.md
@@ -5,7 +5,7 @@ English | [中文](https://github.com/mindspore-lab/mindocr/blob/main/configs/re
> [RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/pdf/2007.07542.pdf)
-## 1. Introduction
+## Introduction
RobustScanner is an encoder-decoder text recognition algorithm with attention mechanism. The authors of this paper conducted research on the mainstream encoder-decoder recognition frameworks and found that during the decoding process, text not only relies on contextual information but also utilizes positional information. However, most methods rely too much on context information during the decoding process, leading to serious attention shifting problems and thus result in poor performance for text recognition with weak context information or contextless information.
@@ -21,54 +21,20 @@ Overall, the RobustScanner model consists of an encoder and a decoder. The encod
Figure 1. Overall RobustScanner architecture [1]
-## 2. Results
-
+## Requirements
-### Accuracy
-
-According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
+| mindspore | ascend driver | firmware | cann toolkit/kernel |
+|:----------:|:--------------:|:-------------:|:-------------------:|
+| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-
-| **Model** | **Context** | **Backbone** | **Avg Accuracy** | **Train T.** | **FPS** | **ms/step** | **Recipe** | **Download** |
-|:-------------:|:--------------:|:---------:|:---------:|:---------------------:|:-------:|:-----------:|:----------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
-| RobustScanner | D910x4-MS2.0-G | ResNet-31 | 87.86% | 12702 s/epoch | 550 | 465 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir) |
-
-
-Note: In addition to using the MJSynth (partial) and SynthText (partial) text recognition datasets, RobustScanner is also trained with the SynthAdd dataset and some real datasets. The specific details of the data can be found in the paper or [here](#312-Dataset-Download).
+## Quick Start
+### Preparation
-
-
- Detailed accuracy results for each benchmark dataset
-
-| **Model** | **Backbone** | **IIIT5k** | **SVT** | **IC13** | **IC15** | **SVTP** | **CUTE80** | **Average** |
-| :------: | :------: |:----------:|:-------:|:--------:|:--------:|:--------:|:----------:|:-----------:|
-| RobustScanner | ResNet-31 | 95.50% | 92.12% | 94.29% | 73.33% | 82.33% | 89.58% | 87.86% |
-
-
-
-**Notes:**
-- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x4-MS1.10-G is for training on 4 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.10.
-- To reproduce the result on other contexts, please ensure the global batch size is the same.
-- The model uses an English character dictionary, en_dict90.txt, consisting of 90 characters including digits, common symbols, and upper and lower case English letters. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary).
-- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section.
-- The input Shapes of MindIR of RobustScanner is (1, 3, 48, 160) and it is for Ascend only.
-
-## 3. Quick Start
-### 3.1 Preparation
-
-#### 3.1.1 Installation
+#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.
-#### 3.1.2 Dataset Download
+#### Dataset Download
The dataset used for training and validation in this work, was referenced from the datasets used by mmocr and PaddleOCR for reproducing the RobustScanner algorithms. We are very grateful to mmocr and PaddleOCR for improving the reproducibility efficiency of this repository.
The details of the dataset are as follows:
@@ -99,7 +65,7 @@ The downloaded file contains several compressed files, including:
- `testing_lmdb.zip`: contains six datasets used for evaluating the model, including CUTE80, icdar2013, icdar2015, IIIT5k, SVT, and SVTP.
-#### 3.1.3 Dataset Usage
+#### Dataset Usage
The data folder should be unzipped following the directory structure below:
@@ -257,7 +223,7 @@ eval:
...
```
-#### 3.1.4 Check YAML Config Files
+#### Check YAML Config Files
Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`,
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.loader.batch_size`. Explanations of these important args:
@@ -297,7 +263,7 @@ eval:
- As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size.
-### 3.2 Model Training
+### Model Training
* Distributed Training
@@ -321,7 +287,7 @@ python tools/train.py --config configs/rec/robustscanner/robustscanner_resnet31.
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`.
-### 3.3 Model Evaluation
+### Model Evaluation
To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run:
@@ -329,7 +295,49 @@ To evaluate the accuracy of the trained model, you can use `eval.py`. Please set
python tools/eval.py --config configs/rec/robustscanner/robustscanner_resnet31.yaml
```
-## 4. Character Dictionary
+## Results
+
+
+### Accuracy
+
+According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
+
+
+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| RobustScanner | ResNet31 | Real_data+Synth_data | 48.00 | 4 | 64 | O2 | 325.35 s | 142.95 | 1790.87 | 89.37% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir)|
+
+
+
+Note: In addition to using the MJSynth (partial) and SynthText (partial) text recognition datasets, RobustScanner is also trained with the SynthAdd dataset and some real datasets. The specific details of the data can be found in the paper or [here](#312-Dataset-Download).
+
+
+
+ Detailed accuracy results for each benchmark dataset
+
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| RobustScanner | ResNet31 | 1 | 94.77% | 94.35% | 95.22% | 94.29% | 82.16% | 73.38% | 95.53% | 92.12% | 82.33% | 89.58% | 89.37% |
+
+
+
+
+**Notes:**
+- To reproduce the result on other contexts, please ensure the global batch size is the same.
+- The model uses an English character dictionary, en_dict90.txt, consisting of 90 characters including digits, common symbols, and upper and lower case English letters. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary).
+- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-usage) section.
+- The input Shapes of MindIR of RobustScanner is (1, 3, 48, 160) and it is for Ascend only.
+
+
+## Character Dictionary
### Default Setting
diff --git a/configs/rec/robustscanner/README_CN.md b/configs/rec/robustscanner/README_CN.md
index 43f367bfd..a2c72ef02 100644
--- a/configs/rec/robustscanner/README_CN.md
+++ b/configs/rec/robustscanner/README_CN.md
@@ -5,7 +5,7 @@
> [RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/pdf/2007.07542.pdf)
-## 1. 模型描述
+## 模型描述
RobustScanner是具有注意力机制的编码器-解码器文字识别算法,本作作者通过对当时主流方法编解码器识别框架的研究,发现文字在解码过程中,不仅依赖上下文信息,还会利用位置信息。而大多数方法在解码过程中都过度依赖语义信息,导致存在较为严重的注意力偏移问题,对于没有语义信息或者弱语义信息的文本识别效果不佳。
@@ -21,55 +21,20 @@ RobustScanner是具有注意力机制的编码器-解码器文字识别算法,
图1. RobustScanner整体架构图 [1]
+## 配套版本
-## 2. 评估结果
-
+| mindspore | ascend driver | firmware | cann toolkit/kernel |
+|:----------:|:--------------:|:-------------:|:-------------------:|
+| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-### 训练端
-根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下:
-
-
-
-
-| **模型** | **环境配置** | **骨干网络** | **平均准确率** | **训练时间** | **FPS** | **ms/step** | **配置文件** | **模型权重下载** |
-|:-------------:|:--------------:|:---------:|:---------:|:---------------------:|:-------:|:-----------:|:----------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
-| RobustScanner | D910x4-MS2.0-G | ResNet-31 | 87.86% | 12702 s/epoch | 550 | 465 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir) |
-
-注:除了使用MJSynth(部分)和SynthText(部分)两个文字识别数据集外,还加入了SynthAdd数据,和部分真实数据,具体数据细节可以参考论文或[这里](#312-数据集下载)。
+## 快速开始
+### 环境及数据准备
-
-
- 在各个基准数据集上的准确率
-
- | **模型** | **骨干网络** | **IIIT5k** | **SVT** | **IC13** | **IC15** | **SVTP** | **CUTE80** | **平均准确率** |
- | :------: | :------: |:----------:|:-------:|:--------:|:--------:|:--------:|:----------:|:---------:|
- | RobustScanner | ResNet-31 | 95.50% | 92.12% | 94.29% | 73.33% | 82.33% | 89.58% | 87.86% |
-
-
-
-**注意:**
-- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x4-MS2.0-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore2.0版本进行训练。
-- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
-- 模型使用90个字符的英文字典en_dict90.txt,其中有数字,常用符号以及大小写的英文字母,详细请看[4. 字符词典](#4-字符词典)
-- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。
-- RobustScanner的MindIR导出时的输入Shape均为(1, 3, 48, 160)。
-
-## 3. 快速开始
-### 3.1 环境及数据准备
-
-#### 3.1.1 安装
+#### 安装
环境安装教程请参考MindOCR的 [installation instruction](https://github.com/mindspore-lab/mindocr#installation).
-#### 3.1.2 数据集下载
+#### 数据集下载
本RobustScanner训练、验证使用的数据集参考了mmocr和PaddleOCR所使用的数据集对文献算法进行复现,在此非常感谢mmocr和PaddleOCR,提高了本repo的复现效率。
数据集细节如下:
@@ -98,7 +63,7 @@ Table Format:
- `SynthText800K_shuffle_xxx_xxx.zip`: 1_200共5个zip文件,包含SynthText数据集中随机挑选的240万个样本。
- 验证集
- `testing_lmdb.zip`: 包含了评估模型使用的CUTE80, icdar2013, icdar2015, IIIT5k, SVT, SVTP六个数据集。
-#### 3.1.3 数据集使用
+#### 数据集使用
数据文件夹按照如下结构进行解压:
@@ -216,7 +181,7 @@ eval:
...
```
-通过使用上述配置 yaml 运行 [模型评估](#33-模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
+通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
2. 对同一文件夹下的多个数据集进行评估
@@ -256,7 +221,7 @@ eval:
...
```
-#### 3.1.4 检查配置文件
+#### 检查配置文件
除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`,
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.loader.batch_size`。说明如下:
@@ -300,7 +265,7 @@ eval:
- 您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/cn/tutorials/training_recognition_custom_dataset.md)。
-### 3.2 模型训练
+### 模型训练
* 分布式训练
@@ -324,7 +289,7 @@ python tools/train.py --config configs/rec/robustscanner/robustscanner_resnet31.
训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`。
-### 3.3 模型评估
+### 模型评估
若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行:
@@ -332,7 +297,49 @@ python tools/train.py --config configs/rec/robustscanner/robustscanner_resnet31.
python tools/eval.py --config configs/rec/robustscanner/robustscanner_resnet31.yaml
```
-## 4. 字符词典
+## 评估结果
+
+
+### 训练端
+根据我们的实验,在公开基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的评估结果如下:
+
+
+
+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:-----------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| RobustScanner | ResNet31 | Real_data+Synth_data | 48.00 | 4 | 64 | O2 | 325.35 s | 142.95 | 1790.87 | 89.37% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/robustscanner/robustscanner_resnet31.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/robustscanner/robustscanner_resnet31-f27eab37-158bde10.mindir)|
+
+
+
+注:除了使用MJSynth(部分)和SynthText(部分)两个文字识别数据集外,还加入了SynthAdd数据,和部分真实数据,具体数据细节可以参考论文或[这里](#数据集下载)。
+
+
+
+ 在各个基准数据集上的准确率
+
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| RobustScanner | ResNet31 | 1 | 94.77% | 94.35% | 95.22% | 94.29% | 82.16% | 73.38% | 95.53% | 92.12% | 82.33% | 89.58% | 89.37% |
+
+
+
+
+**注意:**
+- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
+- 模型使用90个字符的英文字典en_dict90.txt,其中有数字,常用符号以及大小写的英文字母,详细请看[字符词典](#字符词典)
+- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集下载)章节。
+- RobustScanner的MindIR导出时的输入Shape均为(1, 3, 48, 160)。
+
+
+## 字符词典
### 默认设置
diff --git a/configs/rec/svtr/README.md b/configs/rec/svtr/README.md
index 3dd8142f3..d7044c2c3 100644
--- a/configs/rec/svtr/README.md
+++ b/configs/rec/svtr/README.md
@@ -27,8 +27,6 @@ Dominant scene text recognition models commonly contain two building blocks, a v
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-
-
## Quick Start
### Preparation
@@ -172,7 +170,7 @@ eval:
...
```
-By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
+By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
2. Evaluate on multiple datasets under the same folder
@@ -323,19 +321,6 @@ We use a public Chinese text benchmark dataset [Benchmarking-Chinese-Text-Recogn
For detailed instruction of data preparation and yaml configuration, please refer to [ch_dataeset](../../../docs/en/datasets/chinese_text_recognition.md).
-### Training
-
-To train with the prepared datsets and config file, please run:
-
-```shell
-mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/svtr/svtr_tiny_ch.yaml
-```
-
-
-
-### Training with Custom Datasets
-You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md).
-
## Performance
@@ -343,41 +328,29 @@ You can train models for different languages with your own custom datasets. Load
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode.
-*coming soon*
-
-Experiments are tested on ascend 910 with mindspore 2.3.1 graph mode.
-
| **model name** | **cards** | **batch size** | **languages** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** |
-| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: | :-------: | :-----: | :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 65.93% | 69.64% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) |
+| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: |:---------:|:-------:| :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 66.19% | 69.66% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) |
### Specific Purpose Models
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode.
-*coming soon*
-
-Experiments are tested on ascend 910 with mindspore 2.3.1 graph mode.
-
-| **model name** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
-| :------------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :-------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| SVTR-Tiny | 4 | 512 | O2 | 226.86 s | 49.38 | 4560 | 90.23% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3-86ece8c8.mindir) |
-| SVTR-Tiny-8P | 8 | 512 | O2 | 230.74 s | 55.16 | 9840 | 90.32% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) |
-
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:----------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| SVTR-Tiny-8P | Tiny | MJ+ST | 60.24 | 8 | 512 | O2 | 230.39 s | 685.68 | 5973.61 | 90.29% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) |
Detailed accuracy results for each benchmark dataset:
-
-| **model name** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
-| :------------: | :----------: | :----------: | :----------: | :-----------: | :-----------: | :-----------: | :-------------: | :-----: | :------: | :--------: | :---------: |
-| SVTR-Tiny | 95.70% | 95.50% | 95.33% | 93.99% | 83.60% | 79.83% | 94.70% | 91.96% | 85.58% | 86.11% | 90.23% |
-| SVTR-Tiny-8P | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.33% | 90.57% | 86.20% | 86.46% | 90.32% |
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| SVTR-Tiny-8P | Tiny | 1 | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.30% | 90.42% | 86.05% | 86.46% | 90.29% |
### Notes
- To reproduce the result on other contexts, please ensure the global batch size is the same.
-- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary).
-- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section.
+- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary).
+- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-usage) section.
- The input Shapes of MindIR of RARE is (1, 3, 64, 256).
diff --git a/configs/rec/svtr/README_CN.md b/configs/rec/svtr/README_CN.md
index cea591b23..73f9fe57e 100644
--- a/configs/rec/svtr/README_CN.md
+++ b/configs/rec/svtr/README_CN.md
@@ -169,7 +169,7 @@ eval:
...
```
-通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
+通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
2. 对同一文件夹下的多个数据集进行评估
@@ -209,7 +209,7 @@ eval:
...
```
-#### 3.1.4 检查配置文件
+#### 检查配置文件
除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`,
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下:
@@ -320,62 +320,36 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置,
详细的数据准备和config文件配置方式, 请参考 [中文识别数据集准备](../../../docs/zh/datasets/chinese_text_recognition.md)
-### 模型训练验证
-
-准备好数据集和配置文件后,执行以下命令开启多卡训练
-
-```shell
-mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/svtr/svtr_tiny_ch.yaml
-```
-
-
-
-### 使用自定义数据集进行训练
-您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/zh/tutorials/training_recognition_custom_dataset_CN.md)。
-
## 性能表现
### 通用泛化中文模型
在采用图模式的ascend 910*上实验结果,mindspore版本为2.3.1
-*即将到来*
-
-在采用图模式的ascend 910上实验结果,mindspore版本为2.3.1
-
-
| **model name** | **cards** | **batch size** | **languages** | **jit level** | **graph compile** | **ms/step** | **img/s** | **scene** | **web** | **document** | **recipe** | **weight** |
-| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: | :-------: | :-----: | :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 65.93% | 69.64% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) |
+| :------------: | :-------: | :------------: | :-----------: | :-----------: | :---------------: | :---------: | :-------: |:---------:|:-------:| :----------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| SVTR-Tiny | 4 | 256 | Chinese | O2 | 235.1 s | 37.75 | 1580 | 66.19% | 69.66% | 98.01% | [svtr_tiny_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny_ch-2ee6ade4-3e495768.mindir) |
+
### 细分领域模型
在采用图模式的ascend 910*上实验结果,mindspore版本为2.3.1
-*即将到来*
-
-在采用图模式的ascend 910上实验结果,mindspore版本为2.3.1
-
-| **model name** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
-| :------------: | :-------: | :------------: | :-----------: | :---------------: | :---------: | :-------: | :----------: | :-------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| SVTR-Tiny | 4 | 512 | O2 | 226.86 s | 49.38 | 4560 | 90.23% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/svtr/svtr_tiny-950be1c3-86ece8c8.mindir) |
-| SVTR-Tiny-8P | 8 | 512 | O2 | 230.74 s | 55.16 | 9840 | 90.32% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) |
-
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:|:--------------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:----------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| SVTR-Tiny-8P | Tiny | MJ+ST | 60.24 | 8 | 512 | O2 | 230.39 s | 685.68 | 5973.61 | 90.29% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/svtr_tiny_8p.yaml) | [ckpt](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6.ckpt) \| [mindir](https://download-mindspore.osinfra.cn/toolkits/mindocr/svtr/svtr_tiny_8p-0afc75d6-255191ef.mindir) |
在各个基准数据集上的准确率
-| **model name** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
-| :------------: | :----------: | :----------: | :----------: | :-----------: | :-----------: | :-----------: | :-------------: | :-----: | :------: | :--------: | :---------: |
-| SVTR-Tiny | 95.70% | 95.50% | 95.33% | 93.99% | 83.60% | 79.83% | 94.70% | 91.96% | 85.58% | 86.11% | 90.23% |
-| SVTR-Tiny-8P | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.33% | 90.57% | 86.20% | 86.46% | 90.32% |
-
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| SVTR-Tiny-8P | Tiny | 1 | 95.93% | 95.62% | 95.33% | 93.89% | 84.32% | 80.55% | 94.30% | 90.42% | 86.05% | 86.46% | 90.29% |
**注意:**
-- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x4-MS1.10-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore1.10版本进行训练。
- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
-- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典)
-- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。
+- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[字符词典](#字符词典)
+- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集准备)章节。
- SVTR的MindIR导出时的输入Shape均为(1, 3, 64, 256)。
## 参考文献
diff --git a/configs/rec/visionlan/README.md b/configs/rec/visionlan/README.md
index f49728fa6..8ad076587 100644
--- a/configs/rec/visionlan/README.md
+++ b/configs/rec/visionlan/README.md
@@ -6,9 +6,9 @@ English | [中文](README_CN.md)
> VisionLAN: [From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network](https://arxiv.org/abs/2108.09661)
-## 1. Introduction
+## Introduction
-### 1.1 VisionLAN
+### VisionLAN
Visual Language Modeling Network (VisionLAN) [1] is a text recognion model that learns the visual and linguistic information simultaneously via **character-wise occluded feature maps** in the training stage. This model does not require an extra language model to extract linguistic information, since the visual and linguistic information can be learned as a union.
@@ -32,57 +32,20 @@ As shown above, the training pipeline of VisionLAN consists of three modules:
While in the test stage, MLM is not used. Only the backbone and VRM are used for prediction.
-## 2. Results
-
-
-### 2.1 Accuracy
-
-According to our experiments, the evaluation results on ten public benchmark datasets is as follow:
-
-
-
-| **Model** | **Context** | **Backbone**| **Train Dataset** | **Model Params**|**Avg Accuracy** | **Train Time** | **Per Step Time** | **FPS** | **Recipe** | **Download** |
-| :-----: | :-----------: | :--------------: | :----------: | :--------: | :--------: |:----------: |:--------: | :--------: |:--------: |:----------: |
-| visionlan | D910x4-MS2.0-G | resnet45 | MJ+ST| 42.2M | 90.61% | 7718 s/epoch | 417 ms/step | 1,840 img/s | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml)| [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)|
-
-
-
-
-
- Detailed accuracy results for ten benchmark datasets
-
- | **Model** | **Context** | **IC03_860**| **IC03_867**| **IC13_857**|**IC13_1015** | **IC15_1811** |**IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
- | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |:------: |:------: | :------: |:------: |
- | visionlan | D910x4-MS2.0-G | 96.16% | 95.16% | 95.92%| 94.19% | 84.04% | 77.46% | 95.53% | 92.27% | 85.74% |89.58% | 90.61% |
-
-
+## Requirements
-
-
-**Notes:**
-
-- Context: Training context denoted as `{device}x{pieces}-{MS version}-{MS mode}`. Mindspore mode can be either `G` (graph mode) or `F` (pynative mode). For example, `D910x4-MS2.0-G` denotes training on 4 pieces of 910 NPUs using graph mode based on MindSpore version 2.0.0.
-- Train datasets: MJ+ST stands for the combination of two synthetic datasets, SynthText(800k) and MJSynth.
-- To reproduce the result on other contexts, please ensure the global batch size is the same.
-- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [3.2 Dataset preparation](#32-dataset-preparation) section.
-- The input Shape of MindIR of VisionLAN is (1, 3, 64, 256).
+| mindspore | ascend driver | firmware | cann toolkit/kernel |
+|:----------:|:--------------:|:-------------:|:-------------------:|
+| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-## 3. Quick Start
+## Quick Start
-### 3.1 Installation
+### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.
-### 3.2 Dataset preparation
+### Dataset preparation
**Training sets**
@@ -156,7 +119,7 @@ datasets
└── SynText
```
-### 3.3 Update yaml config file
+### Update yaml config file
If the datasets are placed under `./datasets`, there is no need to change the `train.dataset.dataset_root` in the yaml configuration file `configs/rec/visionlan/visionlan_L*.yaml`.
@@ -205,7 +168,7 @@ common:
- As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size.
-### 3.4 Training
+### Training
The training stages include Language-free (LF) and Language-aware (LA) process, and in total three steps for training:
@@ -226,7 +189,7 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/visio
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir` in yaml config file. The default directory is `./tmp_visionlan`.
-### 3.5 Test
+### Test
After all three steps training, change the `system.distribute` to `False` in `configs/rec/visionlan/visionlan_resnet45_LA.yaml` before testing.
@@ -254,10 +217,52 @@ training_step="LA"
python tools/benchmarking/multi_dataset_eval.py --config $yaml_file --opt eval.dataset.data_dir="test" eval.ckpt_load_path="./tmp_visionlan/${training_step}/${model_name}.ckpt"
```
+## Results
+
+
+### Accuracy
+
+According to our experiments, the evaluation results on ten public benchmark datasets is as follow:
+
+
+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:|:-----------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| visionlan | Resnet45 | MJ+ST | 42.22 | 4 | 128 | O2 | 191.52 s | 280.29 | 1826.63 | 90.62% | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml) | [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)|
+
+
+
+
+
+ Detailed accuracy results for ten benchmark datasets
+
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| visionlan | Resnet45 | 1 | 96.16% | 95.16% | 95.92% | 94.19% | 84.04% | 77.47% | 95.53% | 92.27% | 85.89% | 89.58% | 90.62% |
+
+
+
+
+
+**Notes:**
+
+- Train datasets: MJ+ST stands for the combination of two synthetic datasets, SynthText(800k) and MJSynth.
+- To reproduce the result on other contexts, please ensure the global batch size is the same.
+- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset preparation](#dataset-preparation) section.
+- The input Shape of MindIR of VisionLAN is (1, 3, 64, 256).
+
-## 4. Inference
+## Inference
-### 4.1 Prepare MINDIR file
+### Prepare MINDIR file
Please download the [MINDIR](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir) file from the table above, or you can use `tools/export.py` to manually convert any checkpoint file into a MINDIR file:
```bash
@@ -269,7 +274,7 @@ This command will save a `visionlan_resnet45.mindir` under the current working d
> Learn more about [Model Export](https://github.com/mindspore-lab/mindocr/blob/main/docs/en/inference/convert_tutorial.md#11-model-export).
-### 4.2 Mindspore Lite Converter Tool
+### Mindspore Lite Converter Tool
If you haven't downloaded MindSpore Lite, please download it via this [link](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html). More details on how to use MindSpore Lite in Linux Environment refer to [this document](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/converter_tool.html#linux-environment-usage-instructions).
@@ -290,7 +295,7 @@ Running this command will save a `visionlan_resnet45_lite.mindir` under the curr
> Learn more about [Model Conversion](https://github.com/mindspore-lab/mindocr/blob/main/docs/en/inference/convert_tutorial.md#12-model-conversion).
-### 4.3 Inference on A Folder of Images
+### Inference on A Folder of Images
Taking `SVT` test set as an example, the data structure under the dataset folder is:
@@ -335,7 +340,7 @@ The evaluation results are shown below:
```
-## 5. References
+## References
[1] Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang: From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network. ICCV 2021: 14174-14183
diff --git a/configs/rec/visionlan/README_CN.md b/configs/rec/visionlan/README_CN.md
index 588568826..77b740d43 100644
--- a/configs/rec/visionlan/README_CN.md
+++ b/configs/rec/visionlan/README_CN.md
@@ -6,9 +6,9 @@
> VisionLAN: [From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network](https://arxiv.org/abs/2108.09661)
-## 1. 简介
+## 简介
-### 1.1 VisionLAN
+### VisionLAN
视觉语言建模网络(VisionLAN)[1]是一种文本识别模型,它通过在训练阶段使用逐字符遮挡的特征图来同时学习视觉和语言信息。这种模型不需要额外的语言模型来提取语言信息,因为视觉和语言信息可以作为一个整体来学习。
@@ -26,45 +26,21 @@
但在测试阶段,MLM不被使用。只有骨干网络和VRM被用于预测。
-## 2.精度结果
+## 配套版本
-根据我们实验结果,在10个公开数据集上的评估结果如下:
+| mindspore | ascend driver | firmware | cann toolkit/kernel |
+|:----------:|:--------------:|:-------------:|:-------------------:|
+| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
-
-| **Model** | **Context** | **Backbone**| **Train Dataset** | **Model Params**|**Avg Accuracy** | **Train Time** | **Per Step Time** | **FPS** | **Recipe** | **Download** |
-| :-----: | :-----------: | :--------------: | :----------: | :--------: | :--------: |:----------: |:--------: | :--------: |:--------: |:----------: |
-| visionlan | D910x4-MS2.0-G | resnet45 | MJ+ST| 42.2M | 90.61% | 7718s/epoch | 417 ms/step | 1,840 img/s | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml)| [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)|
-
+## 快速入门
-
-
- Detailed accuracy results for ten benchmark datasets
-
- | **Model** | **Context** | **IC03_860**| **IC03_867**| **IC13_857**|**IC13_1015** | **IC15_1811** |**IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
- | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |:------: |:------: | :------: |:------: |
- | visionlan | D910x4-MS2.0-G | 96.16% | 95.16% | 95.92%| 94.19% | 84.04% | 77.46% | 95.53% | 92.27% | 85.74% |89.58% | 90.61% |
-
-
-
-
-
-**注**
-
-- 训练环境表示为`{device}x{pieces}-{MS版本}-{MS模式}`。MindSpore模式可以是`G`(Graph模式)或`F`(Pynative模式)。例如,`D910x4-MS2.0-G`表示使用MindSpore版本2.0.0在4块910 NPUs上使用图模式进行训练。
-- 训练数据集:`MJ+ST`代表两个合成数据集SynthText(800k)和MJSynth的组合。
-- 要在其他训练环境中重现结果,请确保全局批量大小相同。
-- 这些模型是从头开始训练的,没有任何预训练。有关训练和评估的更多数据集详细信息,请参阅[3.2数据集准备](#32数据集准备)部分。
-- VisionLAN的MindIR导出时的输入Shape均为(1, 3, 64, 256)。
-
-## 3.快速入门
-
-### 3.1安装
+### 安装
请参考[MindOCR中的安装说明](https://github.com/mindspore-lab/mindocr#installation)。
-### 3.2数据集准备
+### 数据集准备
**训练集**
@@ -137,7 +113,7 @@ datasets
└── SynText
```
-### 3.3 更新yaml配置文件
+### 更新yaml配置文件
如果数据集放置在`./datasets`目录下,则无需更改yaml配置文件`configs/rec/visionlan/visionlan_L*.yaml`中的`train.dataset.dataset_root`。
否则,请相应地更改以下字段:
@@ -185,7 +161,7 @@ common:
**注意:**
- 由于全局批大小 (batch_size x num_devices) 是对结果复现很重要,因此当GPU/NPU卡数发生变化时,调整batch_size以保持全局批大小不变,或将学习率线性调整为新的全局批大小。
-### 3.4 训练
+### 训练
训练阶段包括无语言(LF)和有语言(LA)过程,总共有三个训练步骤:
@@ -206,7 +182,7 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/visio
训练结果(包括checkpoints、每个阶段的性能和loss曲线)将保存在yaml配置文件中由参数`ckpt_save_dir`解析的目录中。默认目录为`./tmp_visionlan`。
-### 3.5 测试
+### 测试
在完成上述三个训练步骤以后, 用户需要在测试前,将 `configs/rec/visionlan/visionlan_resnet45_LA.yaml` 文件中的`system.distribute`改为 `False`。
@@ -235,10 +211,40 @@ training_step="LA"
python tools/benchmarking/multi_dataset_eval.py --config $yaml_file --opt eval.dataset.data_dir="test" eval.ckpt_load_path="./tmp_visionlan/${training_step}/${model_name}.ckpt"
```
+## 精度结果
+
+根据我们实验结果,在10个公开数据集上的评估结果如下:
+
+
+
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
+|:--------------:|:------------:|:-----------------:|:-------------:|:---------:|:--------------:| :-----------: |:-----------------:|:-----------:|:---------:|:------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| visionlan | Resnet45 | MJ+ST | 42.22 | 4 | 128 | O2 | 191.52 s | 280.29 | 1826.63 | 90.62% | [yaml(LF_1)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml) [yaml(LF_2)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml) [yaml(LA)](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/visionlan/visionlan_resnet45_LA.yaml) | [ckpt files](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_ckpts-7d6e9c04.tar.gz) \| [mindir(LA)](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)|
+
+
+
+
+
+ Detailed accuracy results for ten benchmark datasets
+
+| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **average** |
+|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
+| visionlan | Resnet45 | 1 | 96.16% | 95.16% | 95.92% | 94.19% | 84.04% | 77.47% | 95.53% | 92.27% | 85.89% | 89.58% | 90.62% |
+
+
+
+
+
+**注**
+
+- 训练数据集:`MJ+ST`代表两个合成数据集SynthText(800k)和MJSynth的组合。
+- 要在其他训练环境中重现结果,请确保全局批量大小相同。
+- 这些模型是从头开始训练的,没有任何预训练。有关训练和评估的更多数据集详细信息,请参阅[数据集准备](#数据集准备)部分。
+- VisionLAN的MindIR导出时的输入Shape均为(1, 3, 64, 256)。
-## 4. 推理
+## 推理
-### 4.1 准备 MINDIR 文件
+### 准备 MINDIR 文件
请从上面的表格中中下载[MINDIR](https://download.mindspore.cn/toolkits/mindocr/visionlan/visionlan_resnet45_LA-e9720d9e-71b38d2d.mindir)文件,或者您可以使用`tools/export.py`将任何检查点文件手动转换为 MINDIR 文件:
```bash
@@ -248,7 +254,7 @@ python tools/export.py --model_name_or_config visionlan_resnet45 --data_shape 64
此命令将在当前工作目录下保存一个`visionlan_resnet45.mindir`文件。
-### 4.2 Mindspore Lite Converter Tool
+### Mindspore Lite Converter Tool
如果您尚未下载 MindSpore Lite,请通过此[链接](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html)进行下载。有关如何在 Linux 环境中使用 MindSpore Lite 的更多详细信息,请参阅[此文档](https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/converter_tool.html#linux-environment-usage-instructions)。
@@ -267,7 +273,7 @@ converter_lite \
运行此命令将在当前工作目录下保存一个`visionlan_resnet45_lite.mindir`文件。这是我们可以在`Ascend310`或`310P`平台上进行推理的`MindSpore Lite MindIR`文件。您还可以通过更改`--outputFile`参数来定义不同的文件名。
-### 4.3 对图像文件夹进行推理
+### 对图像文件夹进行推理
以`SVT`测试集为例,数据集文件夹下的数据结构如下:
```text
@@ -308,7 +314,7 @@ python deploy/eval_utils/eval_rec.py \
{'acc': 0.9227202534675598, 'norm_edit_distance': 0.9720136523246765}
```
-## 5. 引用文献
+## 引用文献
[1] Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang: From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network. ICCV 2021: 14174-14183
diff --git a/configs/rec/visionlan/visionlan_resnet45_LA.yaml b/configs/rec/visionlan/visionlan_resnet45_LA.yaml
index e2958d6cf..dfdb4bb3f 100644
--- a/configs/rec/visionlan/visionlan_resnet45_LA.yaml
+++ b/configs/rec/visionlan/visionlan_resnet45_LA.yaml
@@ -1,11 +1,11 @@
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: True
- amp_level: 'O2'
+ amp_level: 'O0'
seed: 42
log_interval: 200
val_while_train: True
- drop_overflow_update: False
+ drop_overflow_update: True
common:
character_dict_path: &character_dict_path
diff --git a/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml b/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml
index da8c2aab4..d6ffb944d 100644
--- a/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml
+++ b/configs/rec/visionlan/visionlan_resnet45_LF_1.yaml
@@ -1,11 +1,11 @@
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: True
- amp_level: 'O2'
+ amp_level: 'O0'
seed: 42
log_interval: 200
val_while_train: True
- drop_overflow_update: False
+ drop_overflow_update: True
common:
character_dict_path: &character_dict_path
diff --git a/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml b/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml
index c3acecb73..9435b1bc8 100644
--- a/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml
+++ b/configs/rec/visionlan/visionlan_resnet45_LF_2.yaml
@@ -1,11 +1,11 @@
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: True
- amp_level: 'O2'
+ amp_level: 'O0'
seed: 42
log_interval: 200
val_while_train: True
- drop_overflow_update: False
+ drop_overflow_update: True
common:
character_dict_path: &character_dict_path
diff --git a/mindocr/models/utils/attention_cells.py b/mindocr/models/utils/attention_cells.py
index f1aa28637..f285cbaf1 100644
--- a/mindocr/models/utils/attention_cells.py
+++ b/mindocr/models/utils/attention_cells.py
@@ -27,6 +27,23 @@ def __init__(
self.matmul = ops.BatchMatMul()
+ self.min_fp16 = ms.tensor(np.finfo(np.float16).min, dtype=ms.float16)
+ self.min_fp32 = ms.tensor(np.finfo(np.float32).min, dtype=ms.float32)
+ self.min_fp64 = ms.tensor(np.finfo(np.float64).min, dtype=ms.float64)
+ self.min_bf16 = ms.tensor(float.fromhex("-0x1.fe00000000000p+127"), dtype=ms.bfloat16)
+
+ def dtype_to_min(self, dtype):
+ if dtype == ms.float16:
+ return self.min_fp16
+ if dtype == ms.float32:
+ return self.min_fp32
+ if dtype == ms.float64:
+ return self.min_fp64
+ if dtype == ms.bfloat16:
+ return self.min_bf16
+ else:
+ raise ValueError(f"Only support get minimum value of (float16, ), but got {dtype}")
+
def dot_product_attention(
self, query: Tensor, key: Tensor, value: Tensor, mask: Optional[Tensor] = None
) -> Tuple[Tensor, Tensor]:
@@ -37,7 +54,7 @@ def dot_product_attention(
if mask is not None:
score = ops.masked_fill(
- score, mask == 0, ms.Tensor(-np.inf, score.dtype)
+ score, mask == 0, self.dtype_to_min(score.dtype)
) # score (N, h, seq_len, seq_len)
p_attn = ops.softmax(score, axis=-1)