-
Notifications
You must be signed in to change notification settings - Fork 2
NDNP Batch Ingest Guide
Eben English edited this page Oct 15, 2019
·
5 revisions
NewspaperWorks provides functionality for batch ingest of digitized newspapers conforming to NDNP digitization specs via a command-line rake task.
To invoke the rake task, run the following command from the home directory of your application:
$ rake newspaper_works:ingest_ndnp -- --path=/path/to/your/ndnp/batch
In addition to path
, the rake task also accepts arguments for admin_set
, depositor
, and visibility
, as in:
$ rake newspaper_works:ingest_ndnp -- --path=/path/to/your/ndnp/batch --admin_set=admin_set/default [email protected] --visibility=open
When run, the rake task will:
- Create a
NewspaperTitle
object for each publication in the batch - Create a
NewspaperContainer
object for each reel - Iterate over the directories in the batch, creating
NewspaperIssue
andNewspaperPage
objects for each issue and page - Attach existing page-level derivatives (ALTO, PDF, etc.) to the
NewspaperPage
objects - Index OCR text to Solr for full-text searching
- Create a word-coordinate JSON derivative file to facilitate page-image search hit highlighting
- Compile an issue-level PDF object from page files and attach as primary file to each
NewspaperIssue
object - Add metadata to the created objects from the corresponding XML manifest files in the batch. (See mapping.)
Notes:
- If a
NewspaperTitle
object with the LCCN in the batch already exists, objects will be associated with the existingNewspaperTitle
. - If no admin_set is specified, the default AdminSet (
admin_set/default
) will be used. - If no depositor is specified, objects will have a
depositor
value ofUser.batch_user.user_key
by default. - If visibility is not specified, objects will have
visibility
value ofopen
by default. - A log file of the batch process will be output to your application's
log/ingest.log
.
The ingest script makes the following assumptions:
- You have a set of files organized according NDNP batch files and directory structure specs.
- In the directory specified in the
path
argument, there is abatch.xml
file that provides a listing of issues in the batch.
For examples of NDNP batches, see http://chroniclingamerica.loc.gov/data/batches/ or newspaper_works_fixtures.