Add Surya model (openvinotoolkit#1695)

* add notebook * rename * ignore/fix tests * add task * Skip treon * fix navigation * fix TOC formatting * Add task * Fix formatting
nikita-savelyevv · Feb 12, 2024 · bbd325e · bbd325e
1 parent 7274573
commit bbd325e
Show file tree

Hide file tree

Showing 11 changed files with 927 additions and 2 deletions.
diff --git a/.ci/ignore_pip_conflicts.txt b/.ci/ignore_pip_conflicts.txt
@@ -14,4 +14,5 @@ notebooks/257-llava-multimodal-chatbot/257-llava-multimodal-chatbot.ipynb # tran
 notebooks/257-llava-multimodal-chatbot/257-videollava-multimodal-chatbot.ipynb # transformers<4.35
 notebooks/273-stable-zephyr-3b-chatbot/273-stable-zephyr-3b-chatbot.ipynb # install requirements.txt after clone repo
 notebooks/279-mobilevlm-language-assistant/279-mobilevlm-language-assistant.ipynb # transformers<4.35
-notebooks/280-depth-anything/280-depth-anything.ipynb # install requirements.txt after clone repo
+notebooks/280-depth-anything/280-depth-anything.ipynb # install requirements.txt after clone repo
+notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb # requires python >=3.9
diff --git a/.ci/ignore_treon_docker.txt b/.ci/ignore_treon_docker.txt
@@ -47,6 +47,7 @@
 276-stable-diffusion-torchdynamo-backend
 281-kosmos2-multimodal-large-language-model
 283-photo-maker
+285-surya-line-level-text-detection
 301-tensorflow-training-openvino
 305-tensorflow-quantization-aware-training
 404-style-transfer-webcam
diff --git a/.ci/ignore_treon_linux.txt b/.ci/ignore_treon_linux.txt
@@ -50,4 +50,5 @@
 276-stable-diffusion-torchdynamo-backend
 281-kosmos2-multimodal-large-language-model
 283-photo-maker
+285-surya-line-level-text-detection
 404-style-transfer-webcam
diff --git a/.ci/ignore_treon_mac.txt b/.ci/ignore_treon_mac.txt
@@ -49,4 +49,5 @@
 279-mobilevlm-language-assistant
 283-photo-maker
 284-openvoice
+285-surya-line-level-text-detection
 404-style-transfer-webcam
diff --git a/.ci/ignore_treon_win.txt b/.ci/ignore_treon_win.txt
@@ -48,4 +48,5 @@
 273-stable-zephyr-3b-chatbot
 276-stable-diffusion-torchdynamo-backend
 281-kosmos2-multimodal-large-language-model
-283-photo-maker
+283-photo-maker
+285-surya-line-level-text-detection
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -165,6 +165,7 @@ DistilBERT
 distilbert
 distiluse
 DL
+DocLayNet
 docstring
 DocVQA
 docvqa
@@ -598,6 +599,7 @@ sd
 SDEdit
 SDXL
 sdxl
+Segformer
 Segmentations
 segmentations
 Segmenter
@@ -662,6 +664,7 @@ Suno
 superresolution
 superset
 Suraj
+surya
 svc
 SVTR
 Swin

diff --git a/README.md b/README.md
@@ -238,6 +238,7 @@ Demos that demonstrate inference on a particular model.
 | [281-kosmos2-multimodal-large-language-model](notebooks/281-kosmos2-multimodal-large-language-model)<br> | Kosmos-2: Multimodal Large Language Model and OpenVINO™ | <img src=https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/annotated_snowman.jpg width=225> |
 | [282-siglip-zero-shot-image-classification](notebooks/282-siglip-zero-shot-image-classification)<br>[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/282-siglip-zero-shot-image-classification/282-siglip-zero-shot-image-classification.ipynb) | Zero-shot Image Classification with SigLIP | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/67365453/c4eb782c-0fef-4a89-a5c6-5cc43518490b width=500> |
 | [283-photo-maker](notebooks/283-photo-maker)<br> | Text-to-image generation using PhotoMaker and OpenVINO | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/91237924/88bccc4a-5789-42ca-8a68-f402c3e7c5a4 width=225> | 
+| [285-surya-line-level-text-detection](notebooks/285-surya-line-level-text-detection)<br>[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb) | Line-level text detection with Surya | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/67365453/7672eb6d-fafb-4ae3-b894-9f98acfeb53a width=225> | 
 
 <div id='-model-training'></div>
 

diff --git a/notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb b/notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb
diff --git a/notebooks/285-surya-line-level-text-detection/README.md b/notebooks/285-surya-line-level-text-detection/README.md
@@ -0,0 +1,27 @@
+# Line-level text detection with Surya
+
+[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb)
+
+In this tutorial we will perform line-level text detection using [Surya](https://github.com/VikParuchuri/surya) toolkit and OpenVINO.
+
+![line-level text detection](https://github.com/VikParuchuri/surya/blob/master/static/images/excerpt.png?raw=true)
+
+[**image source*](https://github.com/VikParuchuri/surya)
+
+
+Model used for line-level text detection based on [Segformer](https://arxiv.org/pdf/2105.15203.pdf). It has the following features:
+* It is specialized for document OCR. It will likely not work on photos or other images.
+* It is for printed text, not handwriting.
+* The model has trained itself to ignore advertisements.
+* Languages with very different character sets may not work well.
+
+#### Table of contents:
+1. Fetch test image.
+1. Run PyTorch inference.
+1. Convert model to OpenVINO Intermediate Representation (IR) format.
+1. Run OpenVINO model.
+1. Interactive inference.
+
+## Installation Instructions
+
+If you have not installed all required dependencies, follow the [Installation Guide](../../README.md).
diff --git a/selector/src/models/notebook-tags.js b/selector/src/models/notebook-tags.js
@@ -32,6 +32,7 @@ export const TASKS = /** @type {const} */ ({
     STYLE_TRANSFER: 'Style Transfer',
     POSE_ESTIMATION: 'Pose Estimation',
     ZERO_SHOT_IMAGE_CLASSIFICATION: 'Zero-Shot Image Classification',
+    TEXT_DETECTION: 'Text Detection',
   },
   NLP: {
     TEXT_CLASSIFICATION: 'Text Classification',

diff --git a/selector/src/shared/notebook-tags.js b/selector/src/shared/notebook-tags.js
@@ -32,6 +32,7 @@ export const TASKS = /** @type {const} */ ({
     STYLE_TRANSFER: 'Style Transfer',
     POSE_ESTIMATION: 'Pose Estimation',
     ZERO_SHOT_IMAGE_CLASSIFICATION: 'Zero-Shot Image Classification',
+    TEXT_DETECTION: 'Text Detection',
   },
   NLP: {
     TEXT_CLASSIFICATION: 'Text Classification',