forked from barak/minidjvu
-
Notifications
You must be signed in to change notification settings - Fork 6
Copy of man page (English)
Alexander Trufanov edited this page Jul 8, 2021
·
1 revision
MINIDJVU-MOD(25) minidjvu-mod-0.9m02 MINIDJVU-MOD(25)
minidjvu-mod - encode/decode black-and-white DjVu pages
minidjvu-mod [options] input_file output_file
There is a similar syntax for multipage compression:
minidjvu-mod [options] input_files output_file
See MULTIPAGE ENCODING section below for more details.
minidjvu-mod encodes and decodes single-page black-and-white DjVu files.
minidjvu-mod is derived from DjVuLibre, which is the primary support library for DjVu.
Besides bitonal DjVu, minidjvu-mod understands Windows BMP, PBM and TIFF (through libtiff) formats. Both inputfile
and outputfile may be BMP, PBM, TIFF or DjVu. The file type is determined by extension. Input and output may coin‐
cide.
When given a DjVu-to-DjVu job, minidjvu-mod decodes, then re-encodes the image. DjVu layers other than bitonal pic‐
ture are lost.
Specifying a bitmap-to-bitmap job is possible, but relatively useful only with --smooth option.
All options preceded by two hyphens can be used with one hyphen. This is done to make minidjvu-mod interface more fa‐
miliar for DjVuLibre users.
To activate the multipage mode either specify in your command line more than just one input file, or pass to minidjvu-
mod a single multipage tiff document. By default (if --indirect is not specified) the compressed pages are stored into
a single bundled document under the name provided in the command line.
There are several options referring to the multipage encoding process, namely --pages-per-dict, --indirect, --Classi‐
fier and --report.
-A
--Averaging
Compute "average" representatives for shapes matching a pattern.
This option is turned on by --lossy.
-a n
--aggression n
Sets aggression for pattern matching. The more the aggression, the less the file size, but the more likely sub‐
stitution errors will occur. The default is 100. Usually you can raise it to about 110 more or less safely.
Probably even 200 will work fine, but don't rely on that.
Consistent aggression levels between versions is not guaranteed. The default, however, will always be 100.
This option turns on --match automatically.
-C n
--Classifier n
Set symbols classifier mode (default is 3). 1 - classifier behave similar to the original implementation in
minidjvu encoder. 2 - classifier make additional efforts to achieve better compression. This require more CPU
time and much more RAM for cache that's allocated per thread. 3 - similar to 2 but takes even more CPU time to
achieve maximum level of document compression (the RAM usage is same as in 2).
BE VERY CAREFUL with modes 2 and 3 as they can slow down your machine or overflow available RAM size. You may
decrease number of threads to save some RAM at the cost of time. Or decrease pages per dict at a cost of file‐
size (not recommended).
-c
--clean
Remove small black marks that are probably noise. This algorithm can really devastate halftone patterns, so
use with caution.
This option is turned on by --lossy.
-d n
--dpi n
Specify the resolution of an image, measured in dots per inch. The resolution affects some algorithms and it's
recorded in DjVu and BMP files (TIFF should join someday).
-e
--erosion
Sacrifice image quality to gain about 5-10% in file size. One erosion is almost invisible, but 10 erosions in
a row spoil an image badly (and they won't give you 50-100% of file size, alas). Erosion bonus stacks with
pattern matching.
Erosion makes no sense when the output is not DjVu.
This option is turned on by --lossy.
-i
--indirect
Specifying this option in multipage mode causes minidjvu-mod to generate an indirect multipage document, con‐
sisting from a single index file, several single-page DjVu files (one per each image passed to the encoder) and
several shared dictionary files. Note that the index file is created under the name specified for the output
file in the command line, while for each page the original input file name is preserved, with the extension be‐
ing changed to ".djvu".
This mode is useful for placing a large document to a Web server, or if you are going to postprocess the gener‐
ated files (e. g. by adding a color background). In the later case you may then want to convert your indirect
document to DjVu bundled, using the djvmcvt utility, supplied with DjVuLibre.
-j
--jb2 This instruct encoder to save pages of the document as jb2 chunks instead of djvu. This is usefull for some
cases of further document postprocessing. Implies --indirect mode.
-l
--lossy
Turn on all lossy options. Is equivalent to --Averaging --clean --erosion --match --smooth.
-m
--match
Run pattern matching. This is the main method of shrinking the file size, but it can also bring trouble with
substitution errors. Use --aggression option to maintain balance between file size and error probability.
This option is turned on by --lossy or --aggression.
-n
--no-prototypes
Disable prototype searching. This makes lossless compression faster, but produced files become much bigger.
-p
--pages-per-dict
Specify how many pages to compress in one pass. The default is 10. If -p 0 is specified, minidjvu-mod will at‐
tempt to process all pages at once, but be aware that this can take a lot of memory, especially on large books.
-r
--report
Print verbose messages about what's done on which page. Works only with multipage encoding. Useful only to
survive boredom while compressing a book.
-s
--smooth
Flip some pixels that appear to be noise. The gain in file size is about 5%. Visually the image is slightly
improved, but it's hardly noticeable.
Current filter is dumb and only removes black pixels with at least 3 white neighbors (of 4). You probably won't
notice the effects.
This option is turned on by --lossy.
-S settings-file
Read encoder settings from a "settings-file". Some command line options may be overriden. Settings file format
could be found in a next paragraph.
-t n
--threads-max n
Process pages assigned to a different shared dictionaries in up to N parallel threads. By default N is equal to
the number of CPU cores if there are only 1 or 2 cores. Otherwise it's equal to number of CPU cores minus 1.
Specify "-t 1" to disable multithreading. minidjvu-mod must be built with OpenMP support to enable this op‐
tion.
-u
--unbuffered
Use unbuffered output to console. Useful for precise progress tracking with -r.
-v
--verbose
Print messages about various stages of the process. It's not very useful, but interesting to examine.
-X
--Xtension
Specifies an extension for shared dictionary files (without a leading period). The default is "djbz".
NOTE: most popular viewer djview4 expects only "djbz" or "iff" extensions.
-w
--warnings
Do not disable libtiff warnings. By default, TIFF warnings are suppressed. Under Windows default TIFF warning
handler creates a message box. This is unacceptable in a batch processing script, for instance. So the minid‐
jvu-mod default behavior is a workaround for libtiff default behavior.
This paragraph describes format of a file that may be used with -S option to fine-tune encoding process. It's quite
verbose and it's expected that such settings file will be generated by some GUI application instead of typing by user.
In particular this option is designed for the needs of ScanTailor Universal ver. 0.3.0+ project. The format is in‐
spired by the format used for setting DjVu document outline in djvused application from DjVuLibre package.
Settings file should shall contain parenthesized expressions in a following format: ( values )
The tabs and symbols of a new line are treated as spaces. value may be a parenthesized expression on its own. So nest‐
ing expressions are possible. Each value may be a word or a number. If values should contain multiple words they must
be enquoted with " symbol. First value of parenthesized expression is considered to be its id
Following ids are possible: options, input-files, djbz, default-djbz, default-image, files, file, image
The first 3 ids are define top-level parenthesized expressions. Others are for nested parenthesized expressions that
may be inside them. Other values that forllow id (except for nested expressions) are considered to be an arguement or
a name of a parameter which is followed by an arguement. If it's a name of some parameter then next value is expected
to be its arguement (sometimes two).
If value starts with # - it and the rest of the line is interpreted as a commentary and ignored.
Let's consider a top-level expressions:
options
Contains application options (pretty the same as may be passed via the command line) and default options for images
and shared dictionaries. There must be only one expression with "options" id in a settings file. Example:
(options # application options and defaults
(default-djbz # default djbz settings
averaging 0 # default averaging (off)
aggression 100 # default aggression level (100)
classifier 3 # default classifier (max compression)
erosion 0 # default erosion (disabled)
no-prototypes 0 # default prototypes usage (on)
xtension djbz # default djbz id extension ("djbz")
)
(default-image # default image options
#dpi 300 # if set, use this dpi value for encoding all images
# except those that have personal dpi option set.
# if not set, use dpi of source image of each page.
smooth 0 # default smoothing image before processing (off)
clean 0 # default cleaning image after processing (off)
erosion 0 # default erosion image after processing (off)
)
indirect 0 # save indirect djvu (multifile) (off)
#lossy 1 # if set, turns off or on following options:
# default-djbz::erosion, default-djbz::averaging
# default-image::smooth, default-image::clean
match
pages-per-dict 10 # automatically assign pages that aren't referred
# in any djbz blocks to the new djbz dictionaries.
# New dictionaries contain 10 (default) pages or less.
report 0 # report progress to stdout
#threads-max 2 # if set, use max N threads for processing (each thread
# process one block of pages. One djbz is a one block).
# By default, if CPU have C cores:
# if C > 2 then N = C-1, otherwise N = 1
verbose 1 # print verbose log to stdout
warnings 1 # print libtiff warnings to stdout )
input-files
Contains a list of files to process. Each file may be presented in this list as an absulute filename or nested expres‐
sion with "file" id. The order of files in this list defines order of pages in the document. There must be only one
expression with "input-files" id in a settings file. Example:
(input-files # Contains a list of input image files
# the order is the same as the the order of pages in document.
# Multipage tiff's are expanded to thet set of single page tiffs.
path/file1 # Full filename of the image. It will use default image options.
"path 2/file2" # Second filename is quoted bcs it contains a space sharacter.
(file # Nested block with id file is used for 3rd image
# to overwrite default image options
path/file3 # full filename of the 3nd image
(image # image settings of the 3nd image
# that overrides default settings
smooth 0
clean 0
# etc. as described in "default-image" expression
#virtual 600 800 # if such parameter is included then input file won't be
# really read. Instead of that an empty page with width 600
# and height 800 will be created in the document. That's
# a bit faster than feeding the encoder with the
# empty image files.
)
# The following parameters may be useful to refer a single or subset
# of pages in a multipage image file (tiff)
page 0 # if file is multipage, use only page 0
page-start 0 # if file is multipage, use pages from 0 to page-end
page-end 3 # if file is multipage, use pages from page-start to 3
)
# etc. for other files. Just write their filename if default settings is fine
# or include filename in (file ...) list to use page-specific settings. )
djbz
Define a single shared dictionary and its settings. There may be a multiple expressions with "djbz" id in a settings
file. The files reffered by the shared dictionary MUST exists in "input-files" list. Example:
(djbz # describes a set of pages that should belong to
# the same shared dictionary and its settings.
id 0001 # Mandatory ID of the djbz. Should be unique. Not neccessary to be
# a number. The extension will be added to it.
xtension iff # overrides default ("djbz") djbz extension, so
# the resulting id will be "0001.iff"
averaging 0 # overrides default-djbz averaging (0)
aggression 100 # overrides default-djbz aggression (100)
classifier 3 # overrides default-djbz classifier used to encode this block
no-prototypes 0 # overrides default-djbz no-prototypes
erosion 0 # overrides default-djbz erosion of glyphs in the shared dictionary
# (which is a jb2 image by nature)
(files # a list of files that should be included in this djbz
# files MUST exists in (input-files ...)
# the structure is pretty same as in (input-files ...),
# but (file ...) lists in (files ...) must not include
# (image ...) options as they are provided in (input-files ...)
path/file1
(file
path/file2
...
)
)
...
)
Note: the files in "input-files" that are not referred in any "djbz" will be distributed between automatically created
shared dictionaries with respect to options:pages-per-dict value. Such dictionaries will use settings from "default-
djbz" expression or default values if it's not provided. The unique id values for shared dictionaries will be automat‐
ically generated.
So, in general settings file shall looks like:
(options
# some app options and overriden defaults
)
(input-files
# list of all files that must be included in the document
)
(djbz
# first djbz
)
(djbz
# second djbz
)
# etc.
That's it.
Multipage encoder does not work properly if pages have different resolution.
minidjvu-mod-0.9m02 June 2021 MINIDJVU-MOD(25)