PDF Create

Omeka plugin that creates OCR'd PDFs from TIFFs. If you have multiple TIFFs for a single item, this provides any easy way to aggregate the TIFFs into a single file for easy viewing/downloading.

Generates OCR via Tesseract.

Stores OCR'd text via PdfText plugin's metadata element for site searching.

Aggregates multiple TIFFs for one item into single OCR'd PDF/a-1b PDF via Ghostscript. When the aggregated PDF is created, it can be found at http://example.com/path/to/your/files/directory/pdfs/ITEM_ID.pdf

Install

This plugin requires the PdfText plugin

The server-side software needed to peform the OCR extraction is Ghostscript and Tesseract. This is the exact versions of the required software verified to work with this plugin (running on Red Hat Enterprise Linux 7):

GPL Ghostscript 9.07 (2013-02-14)
Tesseract 3.04.01
- leptonica 1.73
  - libjpeg 6b (libjpeg-turbo 1.2.90)
    - libpng 1.5.13
    - libtiff 4.0.3
    - zlib 1.2.7
Download the tessdata 3.04.00 tarball
- mv all eng.* files to /usr/local/share/tessdata/
Download the file "pdf.ttf" found here to /usr/local/share/tessdata/
- Without this updated pdf.ttf when two or more PDFs are aggregated into a single PDF via Ghostscript the resulting OCR will have spaces between every letter, essentially ruining the OCR. Essentially the tesseract and ghostscript fonts don't map perfectly, but this file fixes that.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
models/PDFCreate		models/PDFCreate
LICENSE.txt		LICENSE.txt
PDFCreatePlugin.php		PDFCreatePlugin.php
README.md		README.md
plugin.ini		plugin.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Create

Install

About

Releases 1

Packages

Languages

License

kent-state-university-libraries/PDFCreate

Folders and files

Latest commit

History

Repository files navigation

PDF Create

Install

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages