Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
kk928290341 committed Jan 10, 2025
2 parents d9ec7f3 + d1c0db0 commit 0f87fcb
Show file tree
Hide file tree
Showing 17 changed files with 81 additions and 203 deletions.
8 changes: 4 additions & 4 deletions CONTRIBUTING_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

在您首次向 MindOCR 社区提交代码之前,需要签署 CLA。

对于个人贡献者,请参阅[ICLA在线文档]https://www.mindspore.cn/icla以获取详细信息。
对于个人贡献者,请参阅[ICLA在线文档](https://www.mindspore.cn/icla)以获取详细信息。

## 贡献类型

Expand Down Expand Up @@ -46,7 +46,7 @@ MindOCR总是可以使用更多的文档,无论是作为官方MindOCR文档的

准备好做出贡献了吗?以下是为本地开发设置“mindocr”的方法。

1.[GitHub]https://github.com/mindspore-lab/mindocr上fork 'mindocr' 仓库。
1.[GitHub](https://github.com/mindspore-lab/mindocr)上fork 'mindocr' 仓库。
2. 在本地克隆你的 fork:

```shell
Expand Down Expand Up @@ -85,11 +85,11 @@ MindOCR总是可以使用更多的文档,无论是作为官方MindOCR文档的

如果所有静态测试都通过了,您将得到如下输出:

![提交成功前]https://user-images.githubusercontent.com/74176172/221346245-ea868015-bb09-4e53-aa56-73b015e1e336.png
![提交成功前](https://user-images.githubusercontent.com/74176172/221346245-ea868015-bb09-4e53-aa56-73b015e1e336.png)

否则,您需要根据输出修复警告:

![提交前失败]https://user-images.githubusercontent.com/74176172/221346251-7d8f531f-9094-474b-97f0-fd5a55e6d3de.png
![提交前失败](https://user-images.githubusercontent.com/74176172/221346251-7d8f531f-9094-474b-97f0-fd5a55e6d3de.png)

要获取 pre-commit 和 pytest,只需将它们 pip 安装到您的 conda 环境中。

Expand Down
6 changes: 3 additions & 3 deletions configs/det/fcenet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ FCENet is a segmentation-based text detection algorithm. In the text detection s

The idea of deformable convolution is very simple, which is to change the fixed shape of the convolution kernel into a variable one. Based on the position of the original convolution, deformable convolution will generate a random position shift, as shown in the following figure:

<p align="center"><img alt="Figure 1" src="https://github.com/colawyee/mindocr-1/assets/15730439/5dfdbabd-a025-4789-89fb-4f2263e9deff" width="600"/></p>
<p align="center"><img alt="Figure 1" src="https://github.com/user-attachments/assets/858357ca-02ff-46b4-8d4d-4c2e53f00ac5" width="600"/></p>
<p align="center"><em>Figure 1. Deformable Convolution</em></p>

Figure (a) is the original convolutional kernel, Figure (b) is a deformable convolutional kernel that generates random directional position shifts, and Figure (c) and (d) are two special cases of Figure (b). It can be seen that the advantage of this is that it can improve the Geometric transformation ability of the convolution kernel, so that it is not limited to the shape of the original convolution kernel rectangle, but can support more abundant irregular shapes. Deformable convolution performs better in extracting irregular shape features [[1](#references)] and is more suitable for text recognition scenarios in natural scenes.
Expand All @@ -25,7 +25,7 @@ Figure (a) is the original convolutional kernel, Figure (b) is a deformable conv

Fourier contour is a curve fitting method based on Fourier transform. As the number of Fourier degree k increases, more high-frequency signals will be introduced, and the contour description will be more accurate. The following figure shows the ability to describe irregular curves under different Fourier degree:

<p align="center"><img width="445" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/23583dd8-0c67-4774-a4f3-9f5971e9ed93"></p>
<p align="center"><img width="445" alt="Image" src="https://github.com/user-attachments/assets/33f2f7f3-91d6-4e6a-99ee-5930f36d013c"></p>
<p align="center"><em>Figure 2. Fourier contour fitting with progressive approximation</em></p>

It can be seen that as the Fourier degree k increases, the curves it can depict can become very complicated.
Expand All @@ -36,7 +36,7 @@ Fourier Contour Encoding is a method proposed in the paper "Fourier Contour Embe

#### The FCENet Framework

<p align="center"><img width="800" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/cfe9f5b1-d22f-4d01-8f27-0856a930f78b"></p>
<p align="center"><img width="800" alt="Image" src="https://github.com/user-attachments/assets/4cfbda01-84c6-43b1-8a60-4a2b20870c2a"></p>
<p align="center"><em>Figure 3. FCENet framework</em></p>

Like most OCR algorithms, the structure of FCENet can be roughly divided into three parts: backbone, neck, and head. The backbone uses a deformable convolutional version of Resnet50 for feature extraction; The neck section adopts a feature pyramid [[2](#references)], which is a set of convolutional kernels of different sizes, suitable for extracting features of different sizes from the original image, thereby improving the accuracy of object detection. It suits scenes that there are a few text boxes of different sizes in one image; The head part has two branches, one is the classification branch. The classification branch predicts the heat maps of both text regions and text center regions, which are pixel-wise multiplied, resulting in the the classification score map. The loss of classification branch is calculated by the cross entropy between prediction heat maps and ground truth. The regression branch predicts the Fourier signature vectors, which are used to reconstruct text contours via the Inverse Fourier transformation (IFT). Calculate the smooth-l1 loss of the reconstructed text contour and the ground truth contour in the image space as the loss value of the regression branch.
Expand Down
9 changes: 6 additions & 3 deletions configs/det/fcenet/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优

可变形卷积的思想非常简单,就是将原来固定形状的卷积核变成可变的,在原始卷积的位置基础上,可变形卷积会产生一个随机方向的位置偏移,如下图所示:

<p align="center"><img alt="Figure 1" src="https://github.com/colawyee/mindocr-1/assets/15730439/5dfdbabd-a025-4789-89fb-4f2263e9deff" width="600"/></p>
<p align="center"><img alt="Figure 1" src="https://github.com/user-attachments/assets/858357ca-02ff-46b4-8d4d-4c2e53f00ac5" width="600"/></p>

<p align="center"><em>图 1. 可变形卷积</em></p>

图(a)是原始的卷积核,图(b)是产生了随机方向位置偏移的可变形卷积核,图(c)(d)是图(b)的两种特殊情况。可以看出,这样做的好处是可以提升卷积核的几何变换能力,使其不仅局限于原始卷积核矩形的形状,而是可以支持更丰富的不规则形状。可变形卷积对不规则形状特征提取的效果会更好[[1](#参考文献)],也更加适用于自然场景的文本识别场景。
Expand All @@ -27,7 +28,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优

傅里叶轮廓线是基于傅里叶变换的一种曲线拟合方法,随着傅里叶级数的项数k越大,就引入更多的高频信号,对轮廓刻画就越准确。下图展示了不同傅里叶级数情况下对不规则曲线的刻画能力:

<p align="center"><img width="445" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/23583dd8-0c67-4774-a4f3-9f5971e9ed93"></p>
<p align="center"><img width="445" alt="Image" src="https://github.com/user-attachments/assets/33f2f7f3-91d6-4e6a-99ee-5930f36d013c"></p>

<p align="center"><em>图 2. 傅里叶轮廓线渐进估计效果</em></p>

可以看出,随着傅里叶级数的项数k越大,其可以刻画的曲线是可以变得非常精细的。
Expand All @@ -38,7 +40,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优

#### FCENet算法框架

<p align="center"><img width="800" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/cfe9f5b1-d22f-4d01-8f27-0856a930f78b"></p>
<p align="center"><img width="800" alt="Image" src="https://github.com/user-attachments/assets/4cfbda01-84c6-43b1-8a60-4a2b20870c2a"></p>

<p align="center"><em>图 3. FCENet算法框架图</em></p>

像大多数OCR算法一样,FCENet的网络结构大体可以分为backbone,neck,head三个部分。其中backbone采用可变形卷积版本的Resnet50用于提取特征;neck部分采用特征金字塔[[2](#参考文献)],特征金字塔是一组不同大小的卷积核,适用于提取原图中不同大小的特征,从而提高了目标检测的准确率,在一张图片中有不同大小的文本框的场景效果比较好;head部分有两条分支,一条是分类分支,用于预测文本区域和文本中心区域的热力图,通过比较该热力图与监督信号的交叉熵作为分类分支的损失值,另一条是回归分支,回归分支预测傅立叶特征向量,该向量用于通过傅立叶逆变换重构文本轮廓,通过计算重构文本轮廓线和监督信号的轮廓线在图像空间的smooth-l1 loss作为回归分支的损失值。
Expand Down
23 changes: 0 additions & 23 deletions configs/rec/abinet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,29 +277,6 @@ To evaluate the accuracy of the trained model, you can use `eval.py`. Please set
python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
```

**Notes:**
- Context for val_while_train: Since mindspore.nn.transformer requires a fixed batchsize when defined, when choosing val_while_train=True, it is necessary to ensure that the batchsize of the validation set is the same as that of the model.
- So, line 179-185 in minocr.data.builder.py
```
if not is_train:
if drop_remainder and is_main_device:
_logger.warning(
"`drop_remainder` is forced to be False for evaluation "
"to include the last batch for accurate evaluation."
)
drop_remainder = False
```
should be changed to
```
if not is_train:
# if drop_remainder and is_main_device:
_logger.warning(
"`drop_remainder` is forced to be False for evaluation "
"to include the last batch for accurate evaluation."
)
drop_remainder = True
```
## References
<!--- Guideline: Citation format GB/T 7714 is suggested. -->

Expand Down
27 changes: 2 additions & 25 deletions configs/rec/abinet/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ eval:
# label_file: # 验证或评估数据集的标签文件路径,将与`dataset_root`拼接形成完整的验证或评估数据的标签文件路径。当数据集为LMDB格式时无需配置
...
```
通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
通过使用上述配置 yaml 运行 [模型评估](#33-模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。

2.对同一文件夹下的多个数据集进行评估

Expand Down Expand Up @@ -268,7 +268,7 @@ eval:
# 在多个 Ascend 设备上进行分布式训练
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
```
ABINet模型训练时需要加载预训练模型,预训练模型的权重来自https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt,需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。
ABINet模型训练时需要加载预训练模型,预训练模型的权重来自<https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt>,需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。

* 单卡训练

Expand All @@ -291,29 +291,6 @@ python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
```

**注意:**
- 由于mindspore.nn.transformer在定义时需要固定的批处理大小,因此在选择val_while_train=True时,有必要确保验证集的批处理大小与模型的批处理大小相同。
- 所以, minocr.data.builder.py中的第179-185行
```
if not is_train:
if drop_remainder and is_main_device:
_logger.warning(
"`drop_remainder` is forced to be False for evaluation "
"to include the last batch for accurate evaluation."
)
drop_remainder = False
```
应该被改为
```
if not is_train:
# if drop_remainder and is_main_device:
_logger.warning(
"`drop_remainder` is forced to be False for evaluation "
"to include the last batch for accurate evaluation."
)
drop_remainder = True
```
## 参考文献
<!--- Guideline: Citation format GB/T 7714 is suggested. -->

Expand Down
6 changes: 3 additions & 3 deletions configs/rec/crnn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ eval:
...
```

By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.


2. Evaluate on multiple datasets under the same folder
Expand Down Expand Up @@ -388,8 +388,8 @@ Experiments are tested on ascend 310P with mindspore lite 2.3.1 graph mode.
### Notes

- To reproduce the result on other contexts, please ensure the global batch size is the same.
- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary).
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section.
- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary).
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-download) section.
- The input Shapes of MindIR of CRNN_VGG7 and CRNN_ResNet34_vd are both (1, 3, 32, 100).


Expand Down
8 changes: 4 additions & 4 deletions configs/rec/crnn/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ eval:
...
```

通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。


2. 对同一文件夹下的多个数据集进行评估
Expand Down Expand Up @@ -328,7 +328,7 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/crnn/
```

### 使用自定义数据集进行训练
您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/zh/tutorials/training_recognition_custom_dataset_CN.md)
您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/zh/tutorials/training_recognition_custom_dataset.md)

## 性能表现

Expand Down Expand Up @@ -391,8 +391,8 @@ mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/crnn/

### 注意
- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典)
- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。
- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[字符词典](#字符词典)
- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#数据集下载)章节。
- CRNN_VGG7和CRNN_ResNet34_vd的MindIR导出时的输入Shape均为(1, 3, 32, 100)。


Expand Down
2 changes: 1 addition & 1 deletion configs/rec/master/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ According to our experiments, the evaluation results on public benchmark dataset
**Notes:**
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x4-MS1.10-G is for training on 4 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.10.
- To reproduce the result on other contexts, please ensure the global batch size is the same.
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section.
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-preparation) section.
- The input Shapes of MindIR of MASTER is (1, 3, 48, 160).


Expand Down
4 changes: 2 additions & 2 deletions configs/rec/master/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Table Format:
**注意:**
- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x8-MS1.10-G 用于使用图形模式在4张昇腾910 NPU上依赖Mindspore1.10版本进行训练。
- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。
- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集准备)章节。
- Master的MindIR导出时的输入Shape均为(1, 3, 48, 160)。

## 3. 快速开始
Expand Down Expand Up @@ -218,7 +218,7 @@ eval:
...
```

通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
通过使用上述配置 yaml 运行 [模型评估](#33-模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。


2. 对同一文件夹下的多个数据集进行评估
Expand Down
4 changes: 2 additions & 2 deletions configs/rec/rare/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ eval:
...
```

通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
通过使用上述配置 yaml 运行 [模型评估](#33-模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。


2. 对同一文件夹下的多个数据集进行评估
Expand Down Expand Up @@ -361,7 +361,7 @@ mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/rare/
- RARE的MindIR导出时的输入Shape均为(1, 3, 32, 320),只能在昇腾卡上使用。

### 使用自定义数据集进行训练
您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/zh/tutorials/training_recognition_custom_dataset_CN.md)
您可以在自定义的数据集基于提供的预训练权重进行微调训练, 以在特定场景获得更高的识别准确率,具体步骤请参考文档 [使用自定义数据集训练识别网络](../../../docs/zh/tutorials/training_recognition_custom_dataset.md)


## 6. MindSpore Lite 推理
Expand Down
Loading

0 comments on commit 0f87fcb

Please sign in to comment.