Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge music branch to develop. #56

Merged
merged 82 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
fc73bc9
Add scripts for exporting music from PageLayout to MIDI + MusicXML fi…
vlachvojta Sep 1, 2023
bda025a
Add translator dictionary, defining translation from internal shorten…
vlachvojta Sep 7, 2023
fe9aceb
Add base for LayoutEngineYolo using ultralytics YOLO. With conversion…
vlachvojta Sep 12, 2023
2da0f24
Junk-code cleanup and docu.
vlachvojta Sep 13, 2023
2f661d2
Prepare attributes for music-text distinction. WITHOUT exports to Pag…
vlachvojta Sep 13, 2023
5373283
Add support for `region.music_region`. WITHOUT exports to PageXML or …
vlachvojta Sep 14, 2023
1b9b676
Add category attribute to region. WITH saving to pageXML custom tag +…
vlachvojta Sep 14, 2023
7cc5293
Add category attribute to TextLine. WITH saving to pageXML custom tag…
vlachvojta Sep 14, 2023
fae66ce
Little refactoring.
vlachvojta Sep 14, 2023
b02ebb2
Add exporting music directly in `parse_folder.py` using `config.ini` …
vlachvojta Sep 14, 2023
c07ba6a
Fix minor issue with non-existing function `get_las_region_id`
vlachvojta Sep 18, 2023
0afb958
Add sorting music regions "in reading order" using `y_min` of boundin…
vlachvojta Sep 18, 2023
9371467
Remove `RegionCategory` and `LineCategory` enums hard-coded in `layou…
vlachvojta Oct 3, 2023
b733c74
Update `MusicExporter` to export music only from certain categories o…
vlachvojta Oct 3, 2023
b40e479
Remove music exporter option from `parse_folder.py` and make `export_…
vlachvojta Oct 25, 2023
7b93d16
Add option to have more LineCroppers and ORC engines. Set every other…
vlachvojta Oct 27, 2023
50418dd
Disable throwing error if no crop for line. Continue and ignore line.
vlachvojta Oct 27, 2023
8853046
Add PageLayout splitting enabling running multiple layout parsers eac…
vlachvojta Nov 3, 2023
5ce86d1
Remove unused functions.
vlachvojta Nov 3, 2023
16e8ce3
Merge branch 'develop' into music
vlachvojta Nov 3, 2023
3e2fcb8
Add simple script to check if page layouts in two folders have same s…
vlachvojta Dec 1, 2023
a49409c
Merge remote-tracking branch 'origin/develop' into music
vlachvojta Dec 1, 2023
94bcae8
Disable double logging (stdout + stderr)
vlachvojta Dec 1, 2023
d688bc0
Refactor page xml export + import.
vlachvojta Dec 1, 2023
65eacb5
Refactor alto xml export.
vlachvojta Dec 1, 2023
bb88217
Merge remote-tracking branch 'origin/develop' into music
vlachvojta Dec 7, 2023
3354a5b
Minor updates
ikiss-fit Dec 14, 2023
59dd330
Unify most of method names: page_xml to pagexml and alto_xml to altoxml.
vlachvojta Dec 14, 2023
3c3322e
Unify most of the method names: page_xml to pagexml and alto_xml to a…
vlachvojta Dec 14, 2023
27a82eb
New config section parsing and other changes after code review.
vlachvojta Dec 14, 2023
ae516f4
Add image_size to Yolo engine. Add `config_get_list` to get list of c…
vlachvojta Dec 22, 2023
e5dd2b8
Store box confidence (in LayoutExtractorYOLO) to RegionLayout and exp…
vlachvojta Dec 22, 2023
529234e
Add line ID to ALTO export + import.
vlachvojta Dec 22, 2023
036daf3
Delete unwanted script.
vlachvojta Dec 22, 2023
697990c
Add translating short music output to original encoding.
vlachvojta Dec 22, 2023
893baca
Enable loading model to cpu.
vlachvojta Dec 29, 2023
e2a26de
Add `CATEGORIES` option to sorters and delete therefore unused functi…
vlachvojta Dec 29, 2023
d59884a
Alto_export: export music transcription as one string in each TextLine
vlachvojta Jan 4, 2024
0a1a0c0
Tiny improvements.
vlachvojta Jan 15, 2024
faf0496
Delete unused function from `layout`, music integration.
vlachvojta Jan 16, 2024
dd4bbc8
Change simple print warnings to `logger.warning`.
vlachvojta Jan 18, 2024
6fc82e4
Change config categories, line_categories, add decoder filter.
vlachvojta Jan 18, 2024
53ee27a
Rename `MusicTranslator` for to more general `OutputTranslator` and e…
vlachvojta Jan 18, 2024
68e7892
Add option for rendering region categories for non-text regions.
vlachvojta Jan 22, 2024
8ac485e
Tiny refactor before moving.
vlachvojta Jan 22, 2024
7032389
Add minimalistic CLI for `MusicPageExporter` to `user_scripts/export_…
vlachvojta Jan 22, 2024
ef5f36f
Normalize category characters in image rendering.
vlachvojta Jan 23, 2024
8c738ee
Add confidence estimation to PageOCR directly after detection. Update…
vlachvojta Jan 23, 2024
949829a
Add `PageOCR.get_line_confidence` solving problem of wrong confidence…
vlachvojta Jan 26, 2024
0b6e933
Unify logging style.
vlachvojta Jan 26, 2024
092b04f
Update readme, remove translator.Semantic_to_SSemantic.json because i…
vlachvojta Jan 26, 2024
cc47eef
Improve translation of symbols in `output_translator.py`. Return orig…
vlachvojta Jan 26, 2024
c7d90a1
Add `atomic` option to `OutputTranslator` + output substitution toggl…
vlachvojta Jan 31, 2024
d9430bc
Merge branch 'develop' into music.
vlachvojta May 28, 2024
2f84712
Fix `provides_ctc_logits` to look to all `ocrs` instead of `ocr`
vlachvojta May 30, 2024
eddf0e3
Change `SUBSTITUTE_OUTPUT_ATOMIC` to work on a page level, not indivi…
vlachvojta May 30, 2024
70a7e35
Add config parameter `UPDATE_TRANSCRIPTION_BY_CONFIDENCE`
vlachvojta May 30, 2024
9dcd33f
Add ALTO baseline (export + import) in two options (float or points)
vlachvojta Jun 17, 2024
1a46c00
Add ALTO versions (options how to export baseline) + both baseline im…
vlachvojta Jun 19, 2024
2141002
Save polygon points only as positive numbers. (XSD validation issue)
vlachvojta Jun 19, 2024
34c6584
Remove prints.
vlachvojta Jun 19, 2024
82b3e70
Allow run when at least one ORC engine `provide_ctc_logits`.
vlachvojta Jun 19, 2024
2aed4bc
Fix README.md example + delete false info about setup.py.
vlachvojta Jun 19, 2024
4efdbab
Update README.md - spelling correction.
vlachvojta Jun 20, 2024
134f51a
Add typing Optional to allow lower versions of Python (tested on Pyth…
vlachvojta Jun 20, 2024
25f555c
Add libraries needed to install in docker installation.
vlachvojta Jun 20, 2024
26b0c1a
Update texts for better UX.
vlachvojta Jun 20, 2024
3ce8bbc
Make default version of ALTO to the older one.
vlachvojta Jun 25, 2024
3ec7902
Add typing List and Tuple to allow lower versions of Python (tested o…
vlachvojta Jun 25, 2024
520e3ae
Fix page_xml "custom" field export to export category only if not None.
vlachvojta Jun 27, 2024
9b414a0
Set category filter fallback to `[]` for backward compatibility.
vlachvojta Jun 27, 2024
a110f0e
Library versions fixes.
vlachvojta Jun 27, 2024
f5a7a51
Add libraries back to pyproject.toml, so new machines install it righ…
vlachvojta Jul 11, 2024
cc9b0ae
Fix bugs according to Pull request comment.
vlachvojta Aug 5, 2024
62af812
Merge remote-tracking branch 'origin/develop' into music
vlachvojta Aug 5, 2024
b60196f
Add regions to splitting by category. If `region.category` set, move …
vlachvojta Aug 5, 2024
4d2ddaa
Add better None check.
vlachvojta Aug 6, 2024
7c4251e
Disable exporting midi lines if no notes on the line.
vlachvojta Aug 27, 2024
d9c64cd
Merge branch 'develop' into music
vlachvojta Sep 24, 2024
fa1a897
Simplify splitting page layouts to allow backwards (only look at regi…
vlachvojta Oct 16, 2024
f5f2f42
Add IndexError to catch expression when calculating transcription con…
ikiss-fit Oct 25, 2024
747e491
Update layout.py
michal-hradis Nov 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 4 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,14 @@ For the current shell session, this can be achieved by setting ``PYTHONPATH`` up
export PYTHONPATH=/path/to/the/repo:$PYTHONPATH
```

As a more permanent solution, a very simplistic `setup.py` is prepared:
```
python setup.py develop
```
Beware that the `setup.py` does not promise to bring all the required stuff, e.g. setting CUDA up is up to you.

Pero can be later removed from your Python distribution by running:
```
python setup.py develop --uninstall
```

## Available models
General layout analysis (printed and handwritten) with european printed OCR specialized to czech newspapers can be [downloaded here](https://nextcloud.fit.vutbr.cz/s/NtAbHTNkZFpapdJ). The OCR engine is suitable for most european printed documents. It is specialized for low-quality czech newspapers digitized from microfilms, but it provides very good results for almast all types of printed documents in most languages. If you are interested in processing printed fraktur fonts, handwritten documents or medieval manuscripts, feel free to contact the authors. The newest OCR engines are available at [pero-ocr.fit.vutbr.cz](https://pero-ocr.fit.vutbr.cz). OCR engines are available also through API runing at [pero-ocr.fit.vutbr.cz/api](https://pero-ocr.fit.vutbr.cz/api), [github repository](https://github.com/DCGM/pero-ocr-api).

## Command line application
A command line application is ./user_scripts/parse_folder.py. It is able to process images in a directory using an OCR engine. It can render detected lines in an image and provide document content in Page XML and ALTO XML formats. Additionally, it is able to crop all text lines as rectangular regions of normalized size and save them into separate image files.
A command line application is `./user_scripts/parse_folder.py.` It is able to process images in a directory using an OCR engine. It can render detected lines in an image and provide document content in Page XML and ALTO XML formats. Additionally, it is able to crop all text lines as rectangular regions of normalized size and save them into separate image files.

## Running command line application in container
<a name="running-command-line-application-in-container"></a>
A docker container can be built from the sourcecode to run scripts and programs based on the pero-ocr. Example of running the `parse_folder.py` script to generate page-xml files for images in input directory:
```shell
docker run --rm --tty --interactive \
Expand Down Expand Up @@ -63,7 +53,7 @@ import os
import configparser
import cv2
import numpy as np
from pero_ocr.document_ocr.layout import PageLayout
from pero_ocr.core.layout import PageLayout
from pero_ocr.document_ocr.page_parser import PageParser

# Read config file.
Expand Down Expand Up @@ -117,7 +107,7 @@ Currently, only unittests are provided with the code. Some of the code. So simpl
```

#### Simple regression testing
Regression testing can be done by `test/processing_test.sh`. Script calls containerized `parser_folder.py` to process input images and page-xml files and calls user suplied comparison script to compare outputs to example outputs suplied by user. `PERO-OCR` container have to be built in advance to run the test, see 'Running command line application in container' chapter. Script can be called like this:
Regression testing can be done by `test/processing_test.sh`. Script calls containerized `parse_folder.py` to process input images and page-xml files and calls user suplied comparison script to compare outputs to example outputs suplied by user. `PERO-OCR` container have to be built in advance to run the test, see [Running command line application in container](#running-command-line-application-in-container) for more information. Script can be run like this:
```shell
sh test/processing_test.sh \
--input-images path/to/input/image/directory \
Expand Down
9 changes: 6 additions & 3 deletions pero_ocr/core/confidence_estimation.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,12 @@ def squeeze(sequence):
return result


def get_line_confidence(line, labels, aligned_letters=None, log_probs=None):
def get_line_confidence(line, labels=None, aligned_letters=None, log_probs=None):
# There is the same number of outputs as labels (probably transformer model was used) --> each letter has only one
# possible frame in logits and thus it is not needed to align them
# possible frame in logits thus it is not needed to align them
if labels is None:
labels = line.get_labels()

if line.logits.shape[0] == len(labels):
return get_line_confidence_transformer(line, labels)

Expand Down Expand Up @@ -100,7 +103,7 @@ def get_line_confidence(line, labels, aligned_letters=None, log_probs=None):
confidences[i] = max(0, label_prob - other_prob)
last_border = next_border

#confidences = confidences / 2 + 0.5
# confidences = confidences / 2 + 0.5
return confidences


Expand Down
Loading