Skip to content

Commit

Permalink
Add Surya model (openvinotoolkit#1695)
Browse files Browse the repository at this point in the history
* add notebook

* rename

* ignore/fix tests

* add task

* Skip treon

* fix navigation

* fix TOC formatting

* Add task

* Fix formatting
  • Loading branch information
as-suvorov authored Feb 12, 2024
1 parent 7274573 commit bbd325e
Show file tree
Hide file tree
Showing 11 changed files with 927 additions and 2 deletions.
3 changes: 2 additions & 1 deletion .ci/ignore_pip_conflicts.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ notebooks/257-llava-multimodal-chatbot/257-llava-multimodal-chatbot.ipynb # tran
notebooks/257-llava-multimodal-chatbot/257-videollava-multimodal-chatbot.ipynb # transformers<4.35
notebooks/273-stable-zephyr-3b-chatbot/273-stable-zephyr-3b-chatbot.ipynb # install requirements.txt after clone repo
notebooks/279-mobilevlm-language-assistant/279-mobilevlm-language-assistant.ipynb # transformers<4.35
notebooks/280-depth-anything/280-depth-anything.ipynb # install requirements.txt after clone repo
notebooks/280-depth-anything/280-depth-anything.ipynb # install requirements.txt after clone repo
notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb # requires python >=3.9
1 change: 1 addition & 0 deletions .ci/ignore_treon_docker.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
276-stable-diffusion-torchdynamo-backend
281-kosmos2-multimodal-large-language-model
283-photo-maker
285-surya-line-level-text-detection
301-tensorflow-training-openvino
305-tensorflow-quantization-aware-training
404-style-transfer-webcam
1 change: 1 addition & 0 deletions .ci/ignore_treon_linux.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,5 @@
276-stable-diffusion-torchdynamo-backend
281-kosmos2-multimodal-large-language-model
283-photo-maker
285-surya-line-level-text-detection
404-style-transfer-webcam
1 change: 1 addition & 0 deletions .ci/ignore_treon_mac.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,5 @@
279-mobilevlm-language-assistant
283-photo-maker
284-openvoice
285-surya-line-level-text-detection
404-style-transfer-webcam
3 changes: 2 additions & 1 deletion .ci/ignore_treon_win.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,5 @@
273-stable-zephyr-3b-chatbot
276-stable-diffusion-torchdynamo-backend
281-kosmos2-multimodal-large-language-model
283-photo-maker
283-photo-maker
285-surya-line-level-text-detection
3 changes: 3 additions & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ DistilBERT
distilbert
distiluse
DL
DocLayNet
docstring
DocVQA
docvqa
Expand Down Expand Up @@ -598,6 +599,7 @@ sd
SDEdit
SDXL
sdxl
Segformer
Segmentations
segmentations
Segmenter
Expand Down Expand Up @@ -662,6 +664,7 @@ Suno
superresolution
superset
Suraj
surya
svc
SVTR
Swin
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,7 @@ Demos that demonstrate inference on a particular model.
| [281-kosmos2-multimodal-large-language-model](notebooks/281-kosmos2-multimodal-large-language-model)<br> | Kosmos-2: Multimodal Large Language Model and OpenVINO™ | <img src=https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/annotated_snowman.jpg width=225> |
| [282-siglip-zero-shot-image-classification](notebooks/282-siglip-zero-shot-image-classification)<br>[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/282-siglip-zero-shot-image-classification/282-siglip-zero-shot-image-classification.ipynb) | Zero-shot Image Classification with SigLIP | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/67365453/c4eb782c-0fef-4a89-a5c6-5cc43518490b width=500> |
| [283-photo-maker](notebooks/283-photo-maker)<br> | Text-to-image generation using PhotoMaker and OpenVINO | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/91237924/88bccc4a-5789-42ca-8a68-f402c3e7c5a4 width=225> |
| [285-surya-line-level-text-detection](notebooks/285-surya-line-level-text-detection)<br>[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb) | Line-level text detection with Surya | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/67365453/7672eb6d-fafb-4ae3-b894-9f98acfeb53a width=225> |

<div id='-model-training'></div>

Expand Down

Large diffs are not rendered by default.

27 changes: 27 additions & 0 deletions notebooks/285-surya-line-level-text-detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Line-level text detection with Surya

[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb)

In this tutorial we will perform line-level text detection using [Surya](https://github.com/VikParuchuri/surya) toolkit and OpenVINO.

![line-level text detection](https://github.com/VikParuchuri/surya/blob/master/static/images/excerpt.png?raw=true)

[**image source*](https://github.com/VikParuchuri/surya)


Model used for line-level text detection based on [Segformer](https://arxiv.org/pdf/2105.15203.pdf). It has the following features:
* It is specialized for document OCR. It will likely not work on photos or other images.
* It is for printed text, not handwriting.
* The model has trained itself to ignore advertisements.
* Languages with very different character sets may not work well.

#### Table of contents:
1. Fetch test image.
1. Run PyTorch inference.
1. Convert model to OpenVINO Intermediate Representation (IR) format.
1. Run OpenVINO model.
1. Interactive inference.

## Installation Instructions

If you have not installed all required dependencies, follow the [Installation Guide](../../README.md).
1 change: 1 addition & 0 deletions selector/src/models/notebook-tags.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ export const TASKS = /** @type {const} */ ({
STYLE_TRANSFER: 'Style Transfer',
POSE_ESTIMATION: 'Pose Estimation',
ZERO_SHOT_IMAGE_CLASSIFICATION: 'Zero-Shot Image Classification',
TEXT_DETECTION: 'Text Detection',
},
NLP: {
TEXT_CLASSIFICATION: 'Text Classification',
Expand Down
1 change: 1 addition & 0 deletions selector/src/shared/notebook-tags.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ export const TASKS = /** @type {const} */ ({
STYLE_TRANSFER: 'Style Transfer',
POSE_ESTIMATION: 'Pose Estimation',
ZERO_SHOT_IMAGE_CLASSIFICATION: 'Zero-Shot Image Classification',
TEXT_DETECTION: 'Text Detection',
},
NLP: {
TEXT_CLASSIFICATION: 'Text Classification',
Expand Down

0 comments on commit bbd325e

Please sign in to comment.