Merge pull request #56 from DCGM/music

Merge music branch to develop.
DCGM · Nov 14, 2024 · 02e3d7a · 02e3d7a
2 parents 3311055 + 747e491
commit 02e3d7a
Show file tree

Hide file tree

Showing 22 changed files with 2,377 additions and 398 deletions.
diff --git a/README.md b/README.md
@@ -18,24 +18,14 @@ For the current shell session, this can be achieved by setting ``PYTHONPATH`` up
 export PYTHONPATH=/path/to/the/repo:$PYTHONPATH
 ```
 
-As a more permanent solution, a very simplistic `setup.py` is prepared:
-```
-python setup.py develop
-```
-Beware that the `setup.py` does not promise to bring all the required stuff, e.g. setting CUDA up is up to you.
-
-Pero can be later removed from your Python distribution by running:
-```
-python setup.py develop --uninstall
-```
-
 ## Available models
 General layout analysis (printed and handwritten) with european printed OCR specialized to czech newspapers can be [downloaded here](https://nextcloud.fit.vutbr.cz/s/NtAbHTNkZFpapdJ). The OCR engine is suitable for most european printed documents. It is specialized for low-quality czech newspapers digitized from microfilms, but it provides very good results for almast all types of printed documents in most languages. If you are interested in processing printed fraktur fonts, handwritten documents or medieval manuscripts, feel free to contact the authors. The newest OCR engines are available at [pero-ocr.fit.vutbr.cz](https://pero-ocr.fit.vutbr.cz). OCR engines are available also through API runing at [pero-ocr.fit.vutbr.cz/api](https://pero-ocr.fit.vutbr.cz/api), [github repository](https://github.com/DCGM/pero-ocr-api).
 
 ## Command line application
-A command line application is ./user_scripts/parse_folder.py. It is able to process images in a directory using an OCR engine. It can render detected lines in an image and provide document content in Page XML and ALTO XML formats. Additionally, it is able to crop all text lines as rectangular regions of normalized size and save them into separate image files.
+A command line application is `./user_scripts/parse_folder.py.` It is able to process images in a directory using an OCR engine. It can render detected lines in an image and provide document content in Page XML and ALTO XML formats. Additionally, it is able to crop all text lines as rectangular regions of normalized size and save them into separate image files.
 
 ## Running command line application in container
+<a name="running-command-line-application-in-container"></a>
 A docker container can be built from the sourcecode to run scripts and programs based on the pero-ocr. Example of running the `parse_folder.py` script to generate page-xml files for images in input directory:
 ```shell
 docker run --rm --tty --interactive \
@@ -63,7 +53,7 @@ import os
 import configparser
 import cv2
 import numpy as np
-from pero_ocr.document_ocr.layout import PageLayout
+from pero_ocr.core.layout import PageLayout
 from pero_ocr.document_ocr.page_parser import PageParser
 
 # Read config file.
@@ -117,7 +107,7 @@ Currently, only unittests are provided with the code. Some of the code. So simpl
 ```
 
 #### Simple regression testing
-Regression testing can be done by `test/processing_test.sh`. Script calls containerized `parser_folder.py` to process input images and page-xml files and calls user suplied comparison script to compare outputs to example outputs suplied by user. `PERO-OCR` container have to be built in advance to run the test, see 'Running command line application in container' chapter. Script can be called like this:
+Regression testing can be done by `test/processing_test.sh`. Script calls containerized `parse_folder.py` to process input images and page-xml files and calls user suplied comparison script to compare outputs to example outputs suplied by user. `PERO-OCR` container have to be built in advance to run the test, see [Running command line application in container](#running-command-line-application-in-container) for more information. Script can be run like this:
 ```shell
 sh test/processing_test.sh \
      --input-images path/to/input/image/directory \

diff --git a/pero_ocr/core/confidence_estimation.py b/pero_ocr/core/confidence_estimation.py
@@ -70,9 +70,12 @@ def squeeze(sequence):
     return result
 
 
-def get_line_confidence(line, labels, aligned_letters=None, log_probs=None):
+def get_line_confidence(line, labels=None, aligned_letters=None, log_probs=None):
     # There is the same number of outputs as labels (probably transformer model was used) --> each letter has only one
-    # possible frame in logits and thus it is not needed to align them
+    # possible frame in logits thus it is not needed to align them
+    if labels is None:
+        labels = line.get_labels()
+
     if line.logits.shape[0] == len(labels):
         return get_line_confidence_transformer(line, labels)
 
@@ -100,7 +103,7 @@ def get_line_confidence(line, labels, aligned_letters=None, log_probs=None):
         confidences[i] = max(0, label_prob - other_prob)
         last_border = next_border
 
-    #confidences = confidences / 2 + 0.5
+    # confidences = confidences / 2 + 0.5
     return confidences