Skip to content

Commit

Permalink
Merge pull request #56 from DCGM/music
Browse files Browse the repository at this point in the history
Merge music branch to develop.
  • Loading branch information
ikiss-fit authored Nov 14, 2024
2 parents 3311055 + 747e491 commit 02e3d7a
Show file tree
Hide file tree
Showing 22 changed files with 2,377 additions and 398 deletions.
18 changes: 4 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,14 @@ For the current shell session, this can be achieved by setting ``PYTHONPATH`` up
export PYTHONPATH=/path/to/the/repo:$PYTHONPATH
```

As a more permanent solution, a very simplistic `setup.py` is prepared:
```
python setup.py develop
```
Beware that the `setup.py` does not promise to bring all the required stuff, e.g. setting CUDA up is up to you.

Pero can be later removed from your Python distribution by running:
```
python setup.py develop --uninstall
```

## Available models
General layout analysis (printed and handwritten) with european printed OCR specialized to czech newspapers can be [downloaded here](https://nextcloud.fit.vutbr.cz/s/NtAbHTNkZFpapdJ). The OCR engine is suitable for most european printed documents. It is specialized for low-quality czech newspapers digitized from microfilms, but it provides very good results for almast all types of printed documents in most languages. If you are interested in processing printed fraktur fonts, handwritten documents or medieval manuscripts, feel free to contact the authors. The newest OCR engines are available at [pero-ocr.fit.vutbr.cz](https://pero-ocr.fit.vutbr.cz). OCR engines are available also through API runing at [pero-ocr.fit.vutbr.cz/api](https://pero-ocr.fit.vutbr.cz/api), [github repository](https://github.com/DCGM/pero-ocr-api).

## Command line application
A command line application is ./user_scripts/parse_folder.py. It is able to process images in a directory using an OCR engine. It can render detected lines in an image and provide document content in Page XML and ALTO XML formats. Additionally, it is able to crop all text lines as rectangular regions of normalized size and save them into separate image files.
A command line application is `./user_scripts/parse_folder.py.` It is able to process images in a directory using an OCR engine. It can render detected lines in an image and provide document content in Page XML and ALTO XML formats. Additionally, it is able to crop all text lines as rectangular regions of normalized size and save them into separate image files.

## Running command line application in container
<a name="running-command-line-application-in-container"></a>
A docker container can be built from the sourcecode to run scripts and programs based on the pero-ocr. Example of running the `parse_folder.py` script to generate page-xml files for images in input directory:
```shell
docker run --rm --tty --interactive \
Expand Down Expand Up @@ -63,7 +53,7 @@ import os
import configparser
import cv2
import numpy as np
from pero_ocr.document_ocr.layout import PageLayout
from pero_ocr.core.layout import PageLayout
from pero_ocr.document_ocr.page_parser import PageParser

# Read config file.
Expand Down Expand Up @@ -117,7 +107,7 @@ Currently, only unittests are provided with the code. Some of the code. So simpl
```

#### Simple regression testing
Regression testing can be done by `test/processing_test.sh`. Script calls containerized `parser_folder.py` to process input images and page-xml files and calls user suplied comparison script to compare outputs to example outputs suplied by user. `PERO-OCR` container have to be built in advance to run the test, see 'Running command line application in container' chapter. Script can be called like this:
Regression testing can be done by `test/processing_test.sh`. Script calls containerized `parse_folder.py` to process input images and page-xml files and calls user suplied comparison script to compare outputs to example outputs suplied by user. `PERO-OCR` container have to be built in advance to run the test, see [Running command line application in container](#running-command-line-application-in-container) for more information. Script can be run like this:
```shell
sh test/processing_test.sh \
--input-images path/to/input/image/directory \
Expand Down
9 changes: 6 additions & 3 deletions pero_ocr/core/confidence_estimation.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,12 @@ def squeeze(sequence):
return result


def get_line_confidence(line, labels, aligned_letters=None, log_probs=None):
def get_line_confidence(line, labels=None, aligned_letters=None, log_probs=None):
# There is the same number of outputs as labels (probably transformer model was used) --> each letter has only one
# possible frame in logits and thus it is not needed to align them
# possible frame in logits thus it is not needed to align them
if labels is None:
labels = line.get_labels()

if line.logits.shape[0] == len(labels):
return get_line_confidence_transformer(line, labels)

Expand Down Expand Up @@ -100,7 +103,7 @@ def get_line_confidence(line, labels, aligned_letters=None, log_probs=None):
confidences[i] = max(0, label_prob - other_prob)
last_border = next_border

#confidences = confidences / 2 + 0.5
# confidences = confidences / 2 + 0.5
return confidences


Expand Down
Loading

0 comments on commit 02e3d7a

Please sign in to comment.