Skip to content
Robert Sachunsky edited this page Feb 9, 2022 · 3 revisions

Introduction to OCR-D processors

An OCR-D processor is a command line tool that adheres to OCR-D's command line interface (CLI) specification. This makes invocations of processors uniform, regardless of the functionality or complexity of a specific processor, i.e. if you know how to invoke one processor, you know how to invoke any processor.

While the CLI is the same across processors, they can have any number of processor-specific parameters. Usage of parameters is described in another introductory article.

What's in the OCR-D CLI?

The OCR-D CLI is summarized in the --help output of any OCR-D processor, for example:

$ ocrd-olena-binarize --help

Usage: ocrd-olena-binarize [OPTIONS]

  popular binarization algorithms implemented by Olena/SCRIBO, wrapped for OCR-D (on page level only)

[...]

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME
  -L, --list-resources            List names of processor resources
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
[...]

Most options have a long form, beginning with two -- and using --separated words as the option name, and a short form, beginning with a single - and a single letter.

Case matters! You must make sure that you use uppercase/lowercase exactly as stated in the --help output.

--help, --version and --dump-json

These are documentation-related options that allow users and software developers to learn how a particular processor works (--help), at which version the processor and underlying OCR-D/core software are (--version) and to directly inspect the ocrd-tool.json of a processor (--dump-json).

Note: As of OCR-D/core v2.12.0, you can also just omit all options to get the --help output of a processor:

ocrd-tesserocr-recognize --help
# equivalent to
ocrd-tesserocr-recognize

-m/--mets METS and -w/--working-dir

The -m option defines the path or URL to the METS file you want to process. If not explicitly set, a file mets.xml in the current working directory is assumed.

If the directory containing the METS and referenced PAGE-XML and image files is not the current working directory, you can override it with the -w option.

-I, -O, -g/--page-id

These options define the data flow through the processor:

  • -I defines the mets:fileGrp (by its USE attribute) which is searched for input files
  • -O defines the mets:fileGrp to which output files are written
  • -g accepts a comma-separated list of page IDs (by the ID of the mets:div that represents a page in the mets:structMap[@TYPE="PHYSICAL"]) to restrict processing to specific pages or page ranges.

Always specify -I, -O and -g (if applicable)! While not technically required (yet), you should always provide -I, -O and, if the data is grouped by page, -g as well. While -I and -O do (still) have defaults, they almost never will fit your particular data and you should never rely on the defaults here.

-l/--log-level LOG_LEVEL

This option allows overriding at what level of verbosity logging happens within an OCR-D processor. The possible values are:

  • DEBUG
  • INFO (the default)
  • WARN
  • ERROR
  • OFF (disables logging altogether)

If you need more information for debugging, add -l DEBUG to the invocation.

-p and -P

These options allow passing parameters to a processor. See the article on parameters for a description of how these options work.

-L and -C

Like -h, -V and -J, these options make the processor perform tasks other than actual processing. They both relate to processor resources – i.e. data files like parameter presets (i.e. configuration) or file parameters (i.e. models) for the processor.

With --list-resources, all installed resources are enumerated. With --show-resource, a single resource can be retrieved for inspection.

--overwrite

The --overwrite option instructs a processor to force through operations that would otherwise lead to a "File already exists" error. This is very useful developing workflows iteratively: While you are still tweaking parameters and order of processors, you can add --overwrite so repeated calls to a processor succeed.

NOTE: Do not use the --overwrite flag in a production environment! The errors that --overwrite prevents are legitimate errors and should be fixed in the source code - please get in touch with the OCR-D community if you encounter "File already exists" errors in a productive workflow.

Example

ocrd-olena-binarize \
  -I MAX \
  -O BIN \
  -g PHYS_0001 \
  -l DEBUG \
  -P impl wolf
  • Calls ocrd-olena-binarize
  • Read files from MAX file group for page PHYS_0001
  • Write output to BIN filegroup
  • Override the log level to DEBUG
  • Override the impl parameter to wolf

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally