Skip to content

eipeele/OCR-Parser

 
 

Repository files navigation

OCR-Parser

##Purpose

The purpose of this project is to create a program that will allow librarians/information professionals to simplify their workflow by automating certain aspects of the metadata extraction process. This program is meant mainly for digitized newspapers, and the testing is being done on papers for the North Carolina Digital Heritage Center.

Outsourcing page level digitization is a cost effective strategy, but results in missing or error prone metadata for the images returned from the vendor. The current solution at our institution is:

  • Create a template spreadsheet
  • Copy and fill revelant series or reel metadata
  • Open each image and manually extract metadata
  • Manually enter the data into the spreadsheet

This process is expensive, time consuming, and error prone.

This project makes some inroads towards automating the metadata harvesting process. It's a bit rough around the edges, but open to improvement as a community effort.

##Usage

All code and instructions will be available from this github repository.

For information on how to get started with usage, please see our Getting Started.

##Contributing

Once the intital project is realized, we'd love to have contributors help continue to automate the workflow.
Please see our Contributing Guidelines for more information.

##Contributors

This project is maintained by Amber Sherman, Dave Pcolar, and Elizabeth Peele.

##License

Provided under an MIT License.

About

Programming project for INLS560

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 80.0%
  • Python 20.0%