Skip to content

Copy of man page (English)

Alexander Trufanov edited this page Jul 8, 2021 · 1 revision

MINIDJVU-MOD(25) minidjvu-mod-0.9m02 MINIDJVU-MOD(25)

NAME

   minidjvu-mod - encode/decode black-and-white DjVu pages

SYNOPSIS

   minidjvu-mod  [options] input_file output_file

   There is a similar syntax for multipage compression:

   minidjvu-mod  [options] input_files output_file

   See MULTIPAGE ENCODING section below for more details.

DESCRIPTION

   minidjvu-mod encodes and decodes single-page black-and-white DjVu files.

   minidjvu-mod is derived from DjVuLibre, which is the primary support library for DjVu.

   Besides  bitonal  DjVu,  minidjvu-mod understands Windows BMP, PBM and TIFF (through libtiff) formats.  Both inputfile
   and outputfile may be BMP, PBM, TIFF or DjVu. The file type is determined by extension.  Input and  output  may  coin‐
   cide.

   When  given  a DjVu-to-DjVu job, minidjvu-mod decodes, then re-encodes the image.  DjVu layers other than bitonal pic‐
   ture are lost.

   Specifying a bitmap-to-bitmap job is possible, but relatively useful only with --smooth option.

   All options preceded by two hyphens can be used with one hyphen.  This is done to make minidjvu-mod interface more fa‐
   miliar for DjVuLibre users.

MULTIPAGE ENCODING

   To activate the multipage mode either specify in your command line more than just one input file, or pass to minidjvu-
   mod a single multipage tiff document. By default (if --indirect is not specified) the compressed pages are stored into
   a single bundled document under the name provided in the command line.

   There  are several options referring to the multipage encoding process, namely --pages-per-dict, --indirect, --Classi‐
   fier and --report.

OPTIONS

   -A

   --Averaging
          Compute "average" representatives for shapes matching a pattern.

          This option is turned on by --lossy.

   -a n

   --aggression n
          Sets aggression for pattern matching. The more the aggression, the less the file size, but the more likely sub‐
          stitution  errors  will  occur.  The default is 100. Usually you can raise it to about 110 more or less safely.
          Probably even 200 will work fine, but don't rely on that.

          Consistent aggression levels between versions is not guaranteed.  The default, however, will always be 100.

          This option turns on --match automatically.

   -C n

   --Classifier n
          Set symbols classifier mode (default is 3). 1 - classifier behave similar to  the  original  implementation  in
          minidjvu  encoder.  2 - classifier make additional efforts to achieve better compression. This require more CPU
          time and much more RAM for cache that's allocated per thread. 3 - similar to 2 but takes even more CPU time  to
          achieve maximum level of document compression (the RAM usage is same as in 2).

          BE  VERY  CAREFUL with modes 2 and 3 as they can slow down your machine or overflow available RAM size. You may
          decrease number of threads to save some RAM at the cost of time. Or decrease pages per dict at a cost of  file‐
          size (not recommended).

   -c

   --clean
          Remove  small  black  marks that are probably noise.  This algorithm can really devastate halftone patterns, so
          use with caution.

          This option is turned on by --lossy.

   -d n

   --dpi n
          Specify the resolution of an image, measured in dots per inch.  The resolution affects some algorithms and it's
          recorded in DjVu and BMP files (TIFF should join someday).

   -e

   --erosion
          Sacrifice  image quality to gain about 5-10% in file size.  One erosion is almost invisible, but 10 erosions in
          a row spoil an image badly (and they won't give you 50-100% of file size, alas).   Erosion  bonus  stacks  with
          pattern matching.

          Erosion makes no sense when the output is not DjVu.

          This option is turned on by --lossy.

   -i

   --indirect

          Specifying  this  option in multipage mode causes minidjvu-mod to generate an indirect multipage document, con‐
          sisting from a single index file, several single-page DjVu files (one per each image passed to the encoder) and
          several  shared  dictionary  files. Note that the index file is created under the name specified for the output
          file in the command line, while for each page the original input file name is preserved, with the extension be‐
          ing changed to ".djvu".

          This mode is useful for placing a large document to a Web server, or if you are going to postprocess the gener‐
          ated files (e. g. by adding a color background). In the later case you may then want to convert  your  indirect
          document to DjVu bundled, using the djvmcvt utility, supplied with DjVuLibre.

   -j

   --jb2  This  instruct  encoder  to  save pages of the document as jb2 chunks instead of djvu. This is usefull for some
          cases of further document postprocessing.  Implies --indirect mode.

   -l

   --lossy
          Turn on all lossy options. Is equivalent to --Averaging --clean --erosion --match --smooth.

   -m

   --match
          Run pattern matching. This is the main method of shrinking the file size, but it can also  bring  trouble  with
          substitution errors. Use --aggression option to maintain balance between file size and error probability.

          This option is turned on by --lossy or --aggression.

   -n

   --no-prototypes
          Disable prototype searching. This makes lossless compression faster, but produced files become much bigger.

   -p

   --pages-per-dict
          Specify  how many pages to compress in one pass. The default is 10. If -p 0 is specified, minidjvu-mod will at‐
          tempt to process all pages at once, but be aware that this can take a lot of memory, especially on large books.

   -r

   --report
          Print verbose messages about what's done on which page.  Works only with multipage encoding.   Useful  only  to
          survive boredom while compressing a book.

   -s

   --smooth
          Flip  some  pixels  that appear to be noise. The gain in file size is about 5%.  Visually the image is slightly
          improved, but it's hardly noticeable.

          Current filter is dumb and only removes black pixels with at least 3 white neighbors (of 4). You probably won't
          notice the effects.

          This option is turned on by --lossy.

   -S settings-file
          Read  encoder settings from a "settings-file". Some command line options may be overriden. Settings file format
          could be found in a next paragraph.

   -t n

   --threads-max n
          Process pages assigned to a different shared dictionaries in up to N parallel threads. By default N is equal to
          the number of CPU cores if there are only 1 or 2 cores. Otherwise it's equal to number of CPU cores minus 1.

          Specify  "-t  1"  to disable multithreading.  minidjvu-mod must be built with OpenMP support to enable this op‐
          tion.

   -u

   --unbuffered
          Use unbuffered output to console. Useful for precise progress tracking with -r.

   -v

   --verbose
          Print messages about various stages of the process.  It's not very useful, but interesting to examine.

   -X

   --Xtension
          Specifies an extension for shared dictionary files (without a leading period). The default is "djbz".

          NOTE: most popular viewer djview4 expects only "djbz" or "iff" extensions.

   -w

   --warnings
          Do not disable libtiff warnings. By default, TIFF warnings are suppressed.  Under Windows default TIFF  warning
          handler creates a message box.  This is unacceptable in a batch processing script, for instance.  So the minid‐
          jvu-mod default behavior is a workaround for libtiff default behavior.

SETTINGS FILE FORMAT

   This paragraph describes format of a file that may be used with -S option to fine-tune encoding  process.  It's  quite
   verbose and it's expected that such settings file will be generated by some GUI application instead of typing by user.
   In particular this option is designed for the needs of ScanTailor Universal ver. 0.3.0+ project.  The  format  is  in‐
   spired by the format used for setting DjVu document outline in djvused application from DjVuLibre package.

   Settings file should shall contain parenthesized expressions in a following format: ( values )

   The tabs and symbols of a new line are treated as spaces. value may be a parenthesized expression on its own. So nest‐
   ing expressions are possible.  Each value may be a word or a number. If values should contain multiple words they must
   be enquoted with " symbol.  First value of parenthesized expression is considered to be its id

   Following ids are possible: options, input-files, djbz, default-djbz, default-image, files, file, image

   The  first  3 ids are define top-level parenthesized expressions. Others are for nested parenthesized expressions that
   may be inside them.  Other values that forllow id (except for nested expressions) are considered to be an arguement or
   a  name of a parameter which is followed by an arguement. If it's a name of some parameter then next value is expected
   to be its arguement (sometimes two).

   If value starts with # - it and the rest of the line is interpreted as a commentary and ignored.

   Let's consider a top-level expressions:

   options

   Contains application options (pretty the same as may be passed via the command line) and default  options  for  images
   and shared dictionaries. There must be only one expression with "options" id in a settings file. Example:

          (options              # application options and defaults

           (default-djbz        # default djbz settings
             averaging     0    # default averaging (off)
             aggression    100  # default aggression level (100)
             classifier    3    # default classifier (max compression)
             erosion       0    # default erosion (disabled)
             no-prototypes 0    # default prototypes usage (on)
             xtension      djbz # default djbz id extension ("djbz")
           )

           (default-image       # default image options

             #dpi           300 # if set, use this dpi value for encoding all images
                                # except those that have personal dpi option set.
                                # if not set, use dpi of source image of each page.

             smooth       0     # default smoothing image before processing (off)
             clean        0     # default cleaning image after processing (off)
             erosion      0     # default erosion image after processing (off)
           )

           indirect       0     # save indirect djvu (multifile) (off)
           #lossy          1    # if set, turns off or on following options:
                                # default-djbz::erosion, default-djbz::averaging
                                # default-image::smooth, default-image::clean

           match
           pages-per-dict 10   # automatically assign pages that aren't referred
                               # in any djbz blocks to the new djbz dictionaries.
                               # New dictionaries contain 10 (default) pages or less.

           report         0    # report progress to stdout
           #threads-max   2    # if set, use max N threads for processing (each thread
                               # process one block of pages. One djbz is a one block).
                               # By default, if CPU have C cores:
                               # if C > 2 then N = C-1, otherwise N = 1
           verbose        1    # print verbose log to stdout
           warnings       1    # print libtiff warnings to stdout )

   input-files

   Contains a list of files to process. Each file may be presented in this list as an absulute filename or nested expres‐
   sion with "file" id. The order of files in this list defines order of pages in the document. There must  be  only  one
   expression with "input-files" id in a settings file. Example:

          (input-files       # Contains a list of input image files
                             # the order is the same as the the order of pages in document.
                             # Multipage tiff's are expanded to thet set of single page tiffs.

           path/file1        # Full filename of the image. It will use default image options.
           "path 2/file2"    # Second filename is quoted bcs it contains a space sharacter.

           (file             # Nested block with id file is used for 3rd image
                             # to overwrite default image options

             path/file3      # full filename of the 3nd image
             (image          # image settings of the 3nd image
                             # that overrides default settings
               smooth   0
               clean    0
               # etc. as described in "default-image" expression

               #virtual 600 800   # if such parameter is included then input file won't be
                             # really read. Instead of that an empty page with width 600
                             # and height 800 will be created in the document. That's
                             # a bit faster than feeding the encoder with the
                             # empty image files.
             )

             # The following parameters may be useful to refer a single or subset
             # of pages in a multipage image file (tiff)
             page       0    # if file is multipage, use only page 0
             page-start 0    # if file is multipage, use pages from 0 to page-end
             page-end   3    # if file is multipage, use pages from page-start to 3
            )

           # etc. for other files. Just write their filename if default settings is fine
           # or include filename in (file ...) list to use page-specific settings.  )

   djbz

   Define  a  single shared dictionary and its settings. There may be a multiple expressions with "djbz" id in a settings
   file. The files reffered by the shared dictionary MUST exists in "input-files" list. Example:

           (djbz             # describes a set of pages that should belong to
                             # the same shared dictionary and its settings.
             id         0001 # Mandatory ID of the djbz. Should be unique. Not neccessary to be
                             # a number. The extension will be added to it.

             xtension   iff  # overrides default ("djbz") djbz extension, so
                             # the resulting id will be "0001.iff"
             averaging  0    # overrides default-djbz averaging (0)
             aggression 100  # overrides default-djbz aggression (100)
             classifier 3    # overrides default-djbz classifier used to encode this block
             no-prototypes 0 # overrides default-djbz no-prototypes
             erosion       0 # overrides default-djbz erosion of glyphs in the shared dictionary
                             # (which is a jb2 image by nature)
             (files          # a list of files that should be included in this djbz
                             # files MUST exists in (input-files ...)
                             # the structure is pretty same as in (input-files ...),
                             # but (file ...) lists in (files ...) must not include
                             # (image ...) options as they are provided in (input-files ...)

               path/file1
               (file
                path/file2
                ...
               )
             )
             ...
            )

   Note: the files in "input-files" that are not referred in any "djbz" will be distributed between automatically created
   shared  dictionaries  with respect to options:pages-per-dict value. Such dictionaries will use settings from "default-
   djbz" expression or default values if it's not provided. The unique id values for shared dictionaries will be automat‐
   ically generated.

   So, in general settings file shall looks like:

           (options
           # some app options and overriden defaults
           )

           (input-files
           # list of all files that must be included in the document
           )

           (djbz
           # first djbz
           )

           (djbz
           # second djbz
           )

           # etc.

   That's it.

BUGS

   Multipage encoder does not work properly if pages have different resolution.

minidjvu-mod-0.9m02 June 2021 MINIDJVU-MOD(25)

Clone this wiki locally