Merge branch 'main' into main

mindspore-lab · Jan 10, 2025 · 4042336 · 4042336
2 parents 53fb2b7 + d1c0db0
commit 4042336
Show file tree

Hide file tree

Showing 21 changed files with 101 additions and 203 deletions.
diff --git a/CONTRIBUTING_CN.md b/CONTRIBUTING_CN.md
@@ -6,7 +6,7 @@
 
 在您首次向 MindOCR 社区提交代码之前，需要签署 CLA。
 
-对于个人贡献者，请参阅[ICLA在线文档]（https://www.mindspore.cn/icla）以获取详细信息。
+对于个人贡献者，请参阅[ICLA在线文档](https://www.mindspore.cn/icla)以获取详细信息。
 
 ## 贡献类型
 
@@ -46,7 +46,7 @@ MindOCR总是可以使用更多的文档，无论是作为官方MindOCR文档的
 
 准备好做出贡献了吗？以下是为本地开发设置“mindocr”的方法。
 
-1. 在 [GitHub]（https://github.com/mindspore-lab/mindocr）上fork 'mindocr' 仓库。
+1. 在 [GitHub](https://github.com/mindspore-lab/mindocr)上fork 'mindocr' 仓库。
 2. 在本地克隆你的 fork：
 
    ```shell
@@ -85,11 +85,11 @@ MindOCR总是可以使用更多的文档，无论是作为官方MindOCR文档的
 
    如果所有静态测试都通过了，您将得到如下输出：
 
-   ![提交成功前]（https://user-images.githubusercontent.com/74176172/221346245-ea868015-bb09-4e53-aa56-73b015e1e336.png）
+   ![提交成功前](https://user-images.githubusercontent.com/74176172/221346245-ea868015-bb09-4e53-aa56-73b015e1e336.png)
 
    否则，您需要根据输出修复警告：
 
-   ![提交前失败]（https://user-images.githubusercontent.com/74176172/221346251-7d8f531f-9094-474b-97f0-fd5a55e6d3de.png）
+   ![提交前失败](https://user-images.githubusercontent.com/74176172/221346251-7d8f531f-9094-474b-97f0-fd5a55e6d3de.png)
 
    要获取 pre-commit 和 pytest，只需将它们 pip 安装到您的 conda 环境中。
 

diff --git a/configs/det/fcenet/README.md b/configs/det/fcenet/README.md
@@ -16,7 +16,7 @@ FCENet is a segmentation-based text detection algorithm. In the text detection s
 
 The idea of deformable convolution is very simple, which is to change the fixed shape of the convolution kernel into a variable one. Based on the position of the original convolution, deformable convolution will generate a random position shift, as shown in the following figure:
 
-<p align="center"><img alt="Figure 1" src="https://github.com/colawyee/mindocr-1/assets/15730439/5dfdbabd-a025-4789-89fb-4f2263e9deff" width="600"/></p>
+<p align="center"><img alt="Figure 1" src="https://github.com/user-attachments/assets/858357ca-02ff-46b4-8d4d-4c2e53f00ac5" width="600"/></p>
 <p align="center"><em>Figure 1. Deformable Convolution</em></p>
 
 Figure (a) is the original convolutional kernel, Figure (b) is a deformable convolutional kernel that generates random directional position shifts, and Figure (c) and (d) are two special cases of Figure (b). It can be seen that the advantage of this is that it can improve the Geometric transformation ability of the convolution kernel, so that it is not limited to the shape of the original convolution kernel rectangle, but can support more abundant irregular shapes. Deformable convolution performs better in extracting irregular shape features  [[1](#references)] and is more suitable for text recognition scenarios in natural scenes.
@@ -25,7 +25,7 @@ Figure (a) is the original convolutional kernel, Figure (b) is a deformable conv
 
 Fourier contour is a curve fitting method based on Fourier transform. As the number of Fourier degree k increases, more high-frequency signals will be introduced, and the contour description will be more accurate. The following figure shows the ability to describe irregular curves under different Fourier degree:
 
-<p align="center"><img width="445" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/23583dd8-0c67-4774-a4f3-9f5971e9ed93"></p>
+<p align="center"><img width="445" alt="Image" src="https://github.com/user-attachments/assets/33f2f7f3-91d6-4e6a-99ee-5930f36d013c"></p>
 <p align="center"><em>Figure 2. Fourier contour fitting with progressive approximation</em></p>
 
 It can be seen that as the Fourier degree k increases, the curves it can depict can become very complicated.
@@ -36,7 +36,7 @@ Fourier Contour Encoding is a method proposed in the paper "Fourier Contour Embe
 
 #### The FCENet Framework
 
-<p align="center"><img width="800" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/cfe9f5b1-d22f-4d01-8f27-0856a930f78b"></p>
+<p align="center"><img width="800" alt="Image" src="https://github.com/user-attachments/assets/4cfbda01-84c6-43b1-8a60-4a2b20870c2a"></p>
 <p align="center"><em>Figure 3. FCENet framework</em></p>
 
 Like most OCR algorithms, the structure of FCENet can be roughly divided into three parts: backbone, neck, and head. The backbone uses a deformable convolutional version of Resnet50 for feature extraction; The neck section adopts a feature pyramid  [[2](#references)], which is a set of convolutional kernels of different sizes, suitable for extracting features of different sizes from the original image, thereby improving the accuracy of object detection. It suits scenes that there are a few text boxes of different sizes in one image; The head part has two branches, one is the classification branch. The classification branch predicts the heat maps of both text regions and text center regions, which are pixel-wise multiplied, resulting in the the classification score map. The loss of classification branch is calculated by the cross entropy between prediction heat maps and ground truth. The regression branch predicts the Fourier signature vectors, which are used to reconstruct text contours via the Inverse Fourier transformation (IFT). Calculate the smooth-l1 loss of the reconstructed text contour and the ground truth contour in the image space as the loss value of the regression branch.

diff --git a/configs/det/fcenet/README_CN.md b/configs/det/fcenet/README_CN.md
@@ -18,7 +18,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优
 
 可变形卷积的思想非常简单，就是将原来固定形状的卷积核变成可变的，在原始卷积的位置基础上，可变形卷积会产生一个随机方向的位置偏移，如下图所示：
 
-<p align="center"><img alt="Figure 1" src="https://github.com/colawyee/mindocr-1/assets/15730439/5dfdbabd-a025-4789-89fb-4f2263e9deff" width="600"/></p>
+<p align="center"><img alt="Figure 1" src="https://github.com/user-attachments/assets/858357ca-02ff-46b4-8d4d-4c2e53f00ac5" width="600"/></p>
+
 <p align="center"><em>图 1. 可变形卷积</em></p>
 
 图(a)是原始的卷积核，图(b)是产生了随机方向位置偏移的可变形卷积核，图(c)(d)是图(b)的两种特殊情况。可以看出，这样做的好处是可以提升卷积核的几何变换能力，使其不仅局限于原始卷积核矩形的形状，而是可以支持更丰富的不规则形状。可变形卷积对不规则形状特征提取的效果会更好[[1](#参考文献)]，也更加适用于自然场景的文本识别场景。
@@ -27,7 +28,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优
 
 傅里叶轮廓线是基于傅里叶变换的一种曲线拟合方法，随着傅里叶级数的项数k越大，就引入更多的高频信号，对轮廓刻画就越准确。下图展示了不同傅里叶级数情况下对不规则曲线的刻画能力：
 
-<p align="center"><img width="445" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/23583dd8-0c67-4774-a4f3-9f5971e9ed93"></p>
+<p align="center"><img width="445" alt="Image" src="https://github.com/user-attachments/assets/33f2f7f3-91d6-4e6a-99ee-5930f36d013c"></p>
+
 <p align="center"><em>图 2. 傅里叶轮廓线渐进估计效果</em></p>
 
 可以看出，随着傅里叶级数的项数k越大，其可以刻画的曲线是可以变得非常精细的。
@@ -38,7 +40,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优
 
 #### FCENet算法框架
 
-<p align="center"><img width="800" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/cfe9f5b1-d22f-4d01-8f27-0856a930f78b"></p>
+<p align="center"><img width="800" alt="Image" src="https://github.com/user-attachments/assets/4cfbda01-84c6-43b1-8a60-4a2b20870c2a"></p>
+
 <p align="center"><em>图 3. FCENet算法框架图</em></p>
 
 像大多数OCR算法一样，FCENet的网络结构大体可以分为backbone，neck，head三个部分。其中backbone采用可变形卷积版本的Resnet50用于提取特征；neck部分采用特征金字塔[[2](#参考文献)]，特征金字塔是一组不同大小的卷积核，适用于提取原图中不同大小的特征，从而提高了目标检测的准确率，在一张图片中有不同大小的文本框的场景效果比较好；head部分有两条分支，一条是分类分支，用于预测文本区域和文本中心区域的热力图，通过比较该热力图与监督信号的交叉熵作为分类分支的损失值，另一条是回归分支，回归分支预测傅立叶特征向量，该向量用于通过傅立叶逆变换重构文本轮廓，通过计算重构文本轮廓线和监督信号的轮廓线在图像空间的smooth-l1 loss作为回归分支的损失值。

diff --git a/configs/kie/layoutlmv3/README_CN.md b/configs/kie/layoutlmv3/README_CN.md
@@ -224,15 +224,15 @@ python tools/infer/text/predict_ser.py --rec_algorithm CRNN_CH --image_dir {dir
 以中文表单的实体识别为例，使用脚本识别`configs/kie/vi_layoutxlm/example.jpg`表单中的实体，结果将默认存放在`./inference_results`文件夹内，也可以通过`--draw_img_save_dir`命令行参数自定义结果存储路径。
 
 <p align="center">
-  <img src="example.jpg" width=1000 />
+  <img src="../vi_layoutxlm/example.jpg" width=1000 />
 </p>
 <p align="center">
   <em> example.jpg </em>
 </p>
 识别结果如图，图片保存为`inference_results/example_ser.jpg`：
 
 <p align="center">
-  <img src="example_ser.jpg" width=1000 />
+  <img src="../vi_layoutxlm/example_ser.jpg" width=1000 />
 </p>
 <p align="center">
   <em> example_ser.jpg </em>

diff --git a/configs/layout/layoutlmv3/README.md b/configs/layout/layoutlmv3/README.md
@@ -72,7 +72,7 @@ python tools/param_converter_from_torch.py \
 ### 2.3 Model Evaluation
 
 ```bash
-python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaybet.yaml
+python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaynet.yaml
 ```
 The evaluation results on the public benchmark dataset (PublayNet) are as follows:
 

diff --git a/configs/layout/layoutlmv3/README_CN.md b/configs/layout/layoutlmv3/README_CN.md
@@ -76,7 +76,7 @@ python tools/param_converter_from_torch.py \
 ### 2.3 模型评估
 
 ```bash
-python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaybet.yaml
+python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaynet.yaml
 ```
 在公开基准数据集（PublayNet）上的-评估结果如下：
 

diff --git a/configs/rec/abinet/README.md b/configs/rec/abinet/README.md
@@ -245,29 +245,7 @@ To evaluate the accuracy of the trained model, you can use `eval.py`. Please set
 python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
 ```
 
-**Notes:**
-- Context for val_while_train: Since mindspore.nn.transformer requires a fixed batchsize when defined, when choosing val_while_train=True, it is necessary to ensure that the batchsize of the validation set is the same as that of the model.
-- So, line 179-185 in minocr.data.builder.py
-```
-if not is_train:
-    if drop_remainder and is_main_device:
-        _logger.warning(
-            "`drop_remainder` is forced to be False for evaluation "
-            "to include the last batch for accurate evaluation."
-        )
-        drop_remainder = False
 
-```
-should be changed to
-```
-if not is_train:
-    # if drop_remainder and is_main_device:
-        _logger.warning(
-            "`drop_remainder` is forced to be False for evaluation "
-            "to include the last batch for accurate evaluation."
-        )
-        drop_remainder = True
-```
 ## Results
 <!--- Guideline:
 Table Format:

diff --git a/configs/rec/abinet/README_CN.md b/configs/rec/abinet/README_CN.md
@@ -148,7 +148,7 @@ eval:
     # label_file:                                                     # 验证或评估数据集的标签文件路径，将与`dataset_root`拼接形成完整的验证或评估数据的标签文件路径。当数据集为LMDB格式时无需配置
   ...
 ```
-通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`，您可以获得数据集 CUTE80 的准确度性能。
+通过使用上述配置 yaml 运行 [模型评估](#33-模型评估) 部分中所述的`tools/eval.py`，您可以获得数据集 CUTE80 的准确度性能。
 
 2.对同一文件夹下的多个数据集进行评估
 
@@ -243,6 +243,7 @@ mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abine
 ```
 ABINet模型训练时需要加载预训练模型，预训练模型的权重来自[abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt)，需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。
 
+
 * 单卡训练
 
 如果要在没有分布式训练的情况下在较小的数据集上训练或微调模型，请将配置参数`distribute`修改为False 并运行：
@@ -264,29 +265,7 @@ python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
 python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
 ```
 
-**注意:**
-- 由于mindspore.nn.transformer在定义时需要固定的批处理大小，因此在选择val_while_train=True时，有必要确保验证集的批处理大小与模型的批处理大小相同。
-- 所以， minocr.data.builder.py中的第179-185行
-```
-if not is_train:
-    if drop_remainder and is_main_device:
-        _logger.warning(
-            "`drop_remainder` is forced to be False for evaluation "
-            "to include the last batch for accurate evaluation."
-        )
-        drop_remainder = False
 
-```
-应该被改为
-```
-if not is_train:
-    # if drop_remainder and is_main_device:
-        _logger.warning(
-            "`drop_remainder` is forced to be False for evaluation "
-            "to include the last batch for accurate evaluation."
-        )
-        drop_remainder = True
-```
 
 ## 评估结果
 <!--- Guideline:
@@ -304,9 +283,9 @@ Table Format:
 
 <div align="center">
 
-| **f**  | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** |            **recipe**            |                                            **weight**                                            |
-|:------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:|
-| ABINet |   Resnet45   |       MJ+ST       |     36.93     |     8     |       96       |      O2       |     680.51 s      |   115.56    |  6646.07  |    91.35%    | [yaml](abinet_resnet45_en.yaml)  | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt)   |
+| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** |            **recipe**            |                                            **weight**                                            |
+|:--------------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:|
+|    ABINet      |   Resnet45   |       MJ+ST       |     36.93     |     8     |       96       |      O2       |     680.51 s      |   115.56    |  6646.07  |    91.35%    | [yaml](abinet_resnet45_en.yaml)  | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt)   |
 
 </div>
 
@@ -323,6 +302,7 @@ Table Format:
 </details>
 
 
+
 ## 参考文献
 <!--- Guideline: Citation format GB/T 7714 is suggested. -->
 

diff --git a/configs/rec/crnn/README.md b/configs/rec/crnn/README.md
@@ -159,7 +159,7 @@ eval:
   ...
 ```
 
-By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
+By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
 
 
 2. Evaluate on multiple datasets under the same folder

diff --git a/configs/rec/crnn/README_CN.md b/configs/rec/crnn/README_CN.md
@@ -159,7 +159,7 @@ eval:
   ...
 ```
 
-通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`，您可以获得数据集 CUTE80 的准确度性能。
+通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`，您可以获得数据集 CUTE80 的准确度性能。
 
 
 2. 对同一文件夹下的多个数据集进行评估
@@ -319,6 +319,7 @@ Mindocr内置了一部分字典，均放在了 `mindocr/utils/dict/` 位置，
 
 详细的数据准备和config文件配置方式, 请参考 [中文识别数据集准备](../../../docs/zh/datasets/chinese_text_recognition.md)
 
+
 ## 性能表现
 
 ### 通用泛化中文模型

diff --git a/configs/rec/master/README.md b/configs/rec/master/README.md
@@ -28,7 +28,9 @@ Attention-based scene text recognizers have gained huge success, which leverages
 ## Quick Start
 ### Preparation
 
+
 #### Installation
+
 Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.
 
 #### Dataset Preparation
@@ -372,7 +374,7 @@ To use a specific dictionary, set the parameter `character_dict_path` to the pat
 
 To inference with MindSpot Lite on Ascend 310, please refer to the tutorial [MindOCR Inference](../../../docs/en/inference/inference_tutorial.md). In short, the whole process consists of the following steps:
 
-**1. Model Export**
+**Model Export**
 
 Please [download](#2-results) the exported MindIR file first, or refer to the [Model Export](../../../docs/en/inference/convert_tutorial.md#1-model-export) tutorial and use the following command to export the trained ckpt model to  MindIR file:
 
@@ -385,16 +387,16 @@ python tools/export.py --model_name_or_config configs/rec/master/master_resnet31
 The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table.
 
 
-**2. Environment Installation**
+**Environment Installation**
 
 Please refer to [Environment Installation](../../../docs/en/inference/environment.md#2-mindspore-lite-inference) tutorial to configure the MindSpore Lite inference environment.
 
-**3. Model Conversion**
+**Model Conversion**
 
 Please refer to [Model Conversion](../../../docs/en/inference/convert_tutorial.md#2-mindspore-lite-mindir-convert),
 and use the `converter_lite` tool for offline conversion of the MindIR file.
 
-**4. Inference**
+**Inference**
 
 Assuming that you obtain output.mindir after model conversion, go to the `deploy/py_infer` directory, and use the following command for inference:
-Original file line number
+Diff line change
@@ Expand Up / @@ -159,7 +159,7 @@ eval: @@
       ...
     ```
-    By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
+    By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
 . Evaluate on multiple datasets under the same folder
@@ Expand Down @@