Skip to content

Latest commit

 

History

History
28 lines (22 loc) · 1.43 KB

README.md

File metadata and controls

28 lines (22 loc) · 1.43 KB

The input to this program is a directory of subdirectories that contain ALTO files and corresponding images. It produces two output directories: one that contains a newly minted METS file along with the original image files (optionally renamed) and the ALTO files, and a second directory for Simple Archive Format subdirectories. The
SAF directories are written to a zip archive for easy DSpace import.

usage: convert.py [-h] [-m METADATA] [-b BUNDLE] [-s STRIP] Input directory Output directory

Converts a directory of alto and image files to Simple Archive Format for DSpace import.

positional arguments:
  Input directory       The parent directory containing ALTO, image files and PDF
  Output directory      Full path to the output directory for the processed SAF subdirectories

optional arguments:
  -h, --help            show this help message and exit
  -m METADATA, --metadata METADATA
                        Metadata configuration file. Current configurations are "collegian", "bulletin"
  -b BUNDLE, --bundle BUNDLE
                        Images can be added to an alternate bundle if you do not want them 
                        included in the default (ORIGINAL) bundle. Typically we use the "iiif" bundle.
  -s STRIP, --strip STRIP
                        If provided this value will be stripped from the beginning of file names, 
                        e.g.Page_001.xml will be converted to 001.xml if the value "Page_" is provided