Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port to ocrd core version 3.0.0 #5

Open
wants to merge 102 commits into
base: fix-alpha-shape
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
2ed2c4f
add executable property
MehmedGIT Aug 13, 2024
61e6caf
add setup method if missing
MehmedGIT Aug 13, 2024
a0965c2
add self.logger wherever missing
MehmedGIT Aug 13, 2024
dbccae5
require core >= 3.0.0a1
kba Aug 13, 2024
8557a26
port part of binarize to core v3
kba Aug 13, 2024
911a4c1
Merge pull request #1 from kba/port-to-v3
MehmedGIT Aug 13, 2024
278b706
move: determine_zoom to common.py
MehmedGIT Aug 13, 2024
6beec17
move: logger init to setup()
MehmedGIT Aug 13, 2024
1b2fea3
refactor: log -> logger
MehmedGIT Aug 13, 2024
fe33494
remove: unused imports
MehmedGIT Aug 13, 2024
3368a53
remove: file grp cardinality checks inside process()
MehmedGIT Aug 13, 2024
ae97768
remove: constructors, adapt setup()
MehmedGIT Aug 13, 2024
60d02d2
completed: OcropyBinarize
MehmedGIT Aug 13, 2024
dcaccd4
remove file grp cardinality asserts
MehmedGIT Aug 13, 2024
b178227
Update ocrd_cis/ocropy/binarize.py
MehmedGIT Aug 14, 2024
67b6107
Update ocrd_cis/ocropy/binarize.py
MehmedGIT Aug 14, 2024
06a98b1
Update ocrd_cis/ocropy/binarize.py
MehmedGIT Aug 14, 2024
1e6cd7b
Update ocrd_cis/ocropy/binarize.py
MehmedGIT Aug 14, 2024
71bb26d
fix: potentially wrong dpi in logs
MehmedGIT Aug 14, 2024
64f02a3
binarize: don't conflate region/lines seg, pass output_file_id
kba Aug 14, 2024
d7c15c7
Update binarize.py
MehmedGIT Aug 14, 2024
156d79f
Merge pull request #2 from kba/fix-binarize-v3
MehmedGIT Aug 14, 2024
19566c0
try to migrate recognize
MehmedGIT Aug 14, 2024
5f60976
fix: migrate recognize
MehmedGIT Aug 14, 2024
e8b2603
fix: detect_zoom logging
MehmedGIT Aug 14, 2024
7dfd496
update: test_lib base url
MehmedGIT Aug 14, 2024
033c38a
logging exception -> error
MehmedGIT Aug 14, 2024
46d84d5
refactor: logger as a first positional argument
MehmedGIT Aug 14, 2024
f6fe4cf
fix: test_lib.bash data url
MehmedGIT Aug 14, 2024
aed0f95
fix: recognize OcrdPage import
MehmedGIT Aug 14, 2024
804f031
try to migrate clip
MehmedGIT Aug 14, 2024
7bdff31
remove: process() methods
MehmedGIT Aug 15, 2024
03c2f15
adapt: docstring of process_page_pcgts
MehmedGIT Aug 15, 2024
90ac28e
refactor: other small things
MehmedGIT Aug 15, 2024
f24f86b
fix: determine_zoom
MehmedGIT Aug 15, 2024
5f8e1df
add missing Levenshtein req in setup
MehmedGIT Aug 15, 2024
9a14e1d
fix: remove version req for Levenshtein
MehmedGIT Aug 15, 2024
4ca4d14
fix: Levenshtein import
MehmedGIT Aug 15, 2024
fbaafcb
update ocrd-cis-binarize to be compatible with bertsky/core#8
kba Aug 15, 2024
516ce4b
binarize: use final v3 API
bertsky Aug 15, 2024
2e4f26f
binarize: use correct types
bertsky Aug 15, 2024
21be941
clip: use final v3 API
bertsky Aug 15, 2024
9539ac9
clip: use correct types
bertsky Aug 15, 2024
734b5eb
recognize: use final v3 API
bertsky Aug 15, 2024
28ad585
recognize: fix typing import
bertsky Aug 16, 2024
9a7c10a
denoise: adapt to final v3 API
bertsky Aug 16, 2024
7c9f39f
deskew: adapt to final v3 API
bertsky Aug 16, 2024
6698668
dewarp: adapt to final v3 API
bertsky Aug 16, 2024
48a3146
resegment: adapt to final v3 API
bertsky Aug 16, 2024
0dd6fba
ocropy_segment: implement process_page_pcgts
MehmedGIT Aug 16, 2024
ad5ac7c
ocropy_segment: remove process
MehmedGIT Aug 16, 2024
5d4007b
segment: adapt to final v3 API
bertsky Aug 16, 2024
df1c35c
train: adapt to final v3 API
bertsky Aug 16, 2024
c08b623
ocrd-tool.json: add v3 cardinalities
bertsky Aug 16, 2024
a18307d
fix: ocropy train errors
MehmedGIT Aug 16, 2024
0ba6839
remove: unused imports
MehmedGIT Aug 16, 2024
7b4ebc6
Merge branch 'port-to-v3' into port-to-v3-return-object
MehmedGIT Aug 16, 2024
6b06e88
Update binarize.py
MehmedGIT Aug 16, 2024
6b19f35
Merge pull request #3 from kba/port-to-v3-return-object
MehmedGIT Aug 16, 2024
d1a14b7
refactor: python strings v3
MehmedGIT Aug 16, 2024
d8542c2
spacing: train
MehmedGIT Aug 16, 2024
d785971
spacing: segment
MehmedGIT Aug 16, 2024
7ca78a9
spacing: resegment
MehmedGIT Aug 16, 2024
1004b43
spacing: rest
MehmedGIT Aug 16, 2024
c5498a0
spacing: dewarp
MehmedGIT Aug 16, 2024
31e1245
fix: dewarp return
MehmedGIT Aug 16, 2024
f86c993
improve str speed: precompute element_name_id
MehmedGIT Aug 16, 2024
b8e3ad6
fix: clip suffix
MehmedGIT Aug 16, 2024
02724f2
fix: denoise return
MehmedGIT Aug 16, 2024
aac6fe0
try to fix: ocropy denoise
MehmedGIT Aug 16, 2024
5548d0e
fix: ocropy denoise
MehmedGIT Aug 16, 2024
c9f0f56
fix: resegment
MehmedGIT Aug 16, 2024
fff9097
optimize segment
MehmedGIT Aug 16, 2024
8b92832
optimize ocropy common
MehmedGIT Aug 17, 2024
fceaffe
optimize ocrolib
MehmedGIT Aug 17, 2024
3de2585
optimize align cli
MehmedGIT Aug 17, 2024
0949277
align: use final v3 API
bertsky Aug 22, 2024
d4f8483
use ocrd_utils instead of pkg_resources
bertsky Aug 22, 2024
ecc44c0
postcorrect: use final v3 API
bertsky Aug 22, 2024
2b310b4
revert: ocropy.ocrolib changes
MehmedGIT Aug 23, 2024
4420c6f
revert: ocropy.common changes
MehmedGIT Aug 23, 2024
2d8650e
remove whitespaces in ocropy.common and ocropy.ocrolib
MehmedGIT Aug 23, 2024
9a153b0
postcorrect: adapt to frozendict Processor.parameter in v3
bertsky Aug 25, 2024
bd0613a
require ocrd>=3.0.0b1
bertsky Aug 26, 2024
f6e437f
add: simple github actions workflow
MehmedGIT Aug 27, 2024
403781a
Update .github/workflow/tests.yml
MehmedGIT Aug 27, 2024
97083bb
Update .github/workflow/tests.yml
MehmedGIT Aug 27, 2024
2b20e0c
fix: checkout ref
MehmedGIT Aug 27, 2024
86a08eb
Create GH Actions workflow: test.yml
MehmedGIT Aug 27, 2024
231edf2
Merge branch 'master' into port-to-v3
MehmedGIT Aug 27, 2024
1d7e9a0
delete: wrong path for workflows
MehmedGIT Aug 27, 2024
224e86f
fix: NaN error for python3.9+
MehmedGIT Aug 27, 2024
a397531
fix: NaN in reading_order in morph.py
MehmedGIT Aug 27, 2024
9cf8305
fix type hints
bertsky Sep 1, 2024
a0c734d
dewarp: make thread-safe
bertsky Sep 1, 2024
66baaf0
recognize: disallow multithreading (impossible with current lstm impl…
bertsky Sep 1, 2024
32ce656
postcorrect: make work under METS Server
bertsky Sep 1, 2024
c4a5999
tests: use METS Server if OCRD_MAX_PARALLEL_PAGES>1
bertsky Sep 1, 2024
ae7dc67
make test: run serially and parallel, show times
bertsky Sep 1, 2024
e540b10
require ocrd>=3.0.0b4
bertsky Sep 2, 2024
99b3489
segment: adapt to numpy deprecation
bertsky Sep 26, 2024
dee1abf
eval/stats: Levenshtein -> rapidfuzz.distance.Levenshtein
kba Oct 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions ocrd_cis/ocropy/binarize.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,6 @@

#sys.path.append(os.path.dirname(os.path.abspath(__file__)))

TOOL = 'ocrd-cis-ocropy-binarize'

def binarize(pil_image, method='ocropy', maxskew=2, threshold=0.5, nrm=False, zoom=1.0):
LOG = getLogger('processor.OcropyBinarize')
LOG.debug('binarizing %dx%d image with method=%s', pil_image.width, pil_image.height, method)
Expand Down Expand Up @@ -71,13 +69,17 @@ class OcropyBinarize(Processor):

def __init__(self, *args, **kwargs):
self.ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][self.executable]
kwargs['version'] = self.ocrd_tool['version']
super(OcropyBinarize, self).__init__(*args, **kwargs)
if hasattr(self, 'output_file_grp'):
# processing context
self.setup()
MehmedGIT marked this conversation as resolved.
Show resolved Hide resolved


@property
def executable(self):
return 'ocrd-cis-ocropy-binarize'

def setup(self):
self.logger = getLogger('processor.OcropyBinarize')
if self.parameter['grayscale'] and self.parameter['method'] != 'ocropy':
Expand Down
8 changes: 5 additions & 3 deletions ocrd_cis/ocropy/clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,18 @@
pil2array, array2pil
)

TOOL = 'ocrd-cis-ocropy-clip'

class OcropyClip(Processor):

def __init__(self, *args, **kwargs):
self.ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][self.executable]
kwargs['version'] = self.ocrd_tool['version']
super(OcropyClip, self).__init__(*args, **kwargs)

@property
def executable(self):
return 'ocrd-cis-ocropy-clip'

def process(self):
"""Clip text regions / lines of the workspace at intersections with neighbours.

Expand Down
8 changes: 5 additions & 3 deletions ocrd_cis/ocropy/denoise.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,18 @@
# binarize,
remove_noise)

TOOL = 'ocrd-cis-ocropy-denoise'

class OcropyDenoise(Processor):

def __init__(self, *args, **kwargs):
self.ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][self.executable]
kwargs['version'] = self.ocrd_tool['version']
super(OcropyDenoise, self).__init__(*args, **kwargs)

@property
def executable(self):
return 'ocrd-cis-ocropy-denoise'

def process(self):
"""Despeckle the pages / regions / lines of the workspace.

Expand Down
6 changes: 5 additions & 1 deletion ocrd_cis/ocropy/deskew.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,14 @@ class OcropyDeskew(Processor):

def __init__(self, *args, **kwargs):
ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = ocrd_tool['tools'][self.executable]
kwargs['version'] = ocrd_tool['version']
super(OcropyDeskew, self).__init__(*args, **kwargs)

@property
def executable(self):
return 'ocrd-cis-ocropy-deskew'

def process(self):
"""Deskew the pages or regions of the workspace.

Expand Down
10 changes: 6 additions & 4 deletions ocrd_cis/ocropy/dewarp.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@

#sys.path.append(os.path.dirname(os.path.abspath(__file__)))

TOOL = 'ocrd-cis-ocropy-dewarp'

class InvalidLine(Exception):
"""Line image does not allow dewarping and should be ignored."""

Expand Down Expand Up @@ -72,13 +70,17 @@ class OcropyDewarp(Processor):

def __init__(self, *args, **kwargs):
self.ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][self.executable]
kwargs['version'] = self.ocrd_tool['version']
super(OcropyDewarp, self).__init__(*args, **kwargs)
if hasattr(self, 'output_file_grp'):
# processing context
self.setup()


@property
def executable(self):
return 'ocrd-cis-ocropy-dewarp'

def setup(self):
# defaults from ocrolib.lineest:
self.lnorm = lineest.CenterNormalizer(
Expand Down
10 changes: 6 additions & 4 deletions ocrd_cis/ocropy/recognize.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@
check_line
)

TOOL = 'ocrd-cis-ocropy-recognize'

def resize_keep_ratio(image, baseheight=48):
scale = baseheight / image.height
wsize = round(image.width * scale)
Expand Down Expand Up @@ -85,13 +83,17 @@ def __init__(self, *args, **kwargs):
self.ocrd_tool = get_ocrd_tool()
self.pad = 16 # ocropus-rpred default
self.network = None # set in process
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][self.executable]
kwargs['version'] = self.ocrd_tool['version']
super(OcropyRecognize, self).__init__(*args, **kwargs)
if hasattr(self, 'output_file_grp'):
# processing context
self.setup()


@property
def executable(self):
return 'ocrd-cis-ocropy-recognize'

def setup(self):
self.logger = getLogger('processor.OcropyRecognize')
# from ocropus-rpred:
Expand Down
8 changes: 5 additions & 3 deletions ocrd_cis/ocropy/resegment.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,18 @@
diff_polygons
)

TOOL = 'ocrd-cis-ocropy-resegment'

class OcropyResegment(Processor):

def __init__(self, *args, **kwargs):
self.ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][self.executable]
kwargs['version'] = self.ocrd_tool['version']
super().__init__(*args, **kwargs)

@property
def executable(self):
return 'ocrd-cis-ocropy-resegment'

def process(self):
"""Resegment lines of the workspace.

Expand Down
8 changes: 5 additions & 3 deletions ocrd_cis/ocropy/segment.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,6 @@
lines2regions
)

TOOL = 'ocrd-cis-ocropy-segment'

def masks2polygons(bg_labels, baselines, fg_bin, name, min_area=None, simplify=None, open_holes=False, reorder=True):
"""Convert label masks into polygon coordinates.

Expand Down Expand Up @@ -248,10 +246,14 @@ class OcropySegment(Processor):

def __init__(self, *args, **kwargs):
self.ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][TOOL]
kwargs['ocrd_tool'] = self.ocrd_tool['tools'][self.executable]
kwargs['version'] = self.ocrd_tool['version']
super(OcropySegment, self).__init__(*args, **kwargs)

@property
def executable(self):
return 'ocrd-cis-ocropy-segment'

def process(self):
"""Segment pages into regions+lines, tables into cells+lines, or regions into lines.

Expand Down
6 changes: 5 additions & 1 deletion ocrd_cis/ocropy/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,17 @@ class OcropyTrain(Processor):
def __init__(self, *args, **kwargs):
self.oldcwd = os.getcwd()
ocrd_tool = get_ocrd_tool()
kwargs['ocrd_tool'] = ocrd_tool['tools']['ocrd-cis-ocropy-train']
kwargs['ocrd_tool'] = ocrd_tool['tools'][self.executable]
kwargs['version'] = ocrd_tool['version']
super(OcropyTrain, self).__init__(*args, **kwargs)
if hasattr(self, 'input_file_grp'):
# processing context
self.setup()

@property
def executable(self):
return 'ocrd-cis-ocropy-train'

def setup(self):
self.log = getLogger('processor.OcropyTrain')
#print(self.parameter)
Expand Down