BookCModel and PDFs #119

kromabiles · 2020-07-14T16:32:55Z

Hello Diego,

As you know, our IR is currently exploring ways to use the IMI to ingest/create BookCModel objects from PDFs. Since the IMI is the main tool we rely on for ingesting content into Islandora, could we explore/test some possible options/solutions for a way to implement a simpler pdf to tiff image capability? Some of our current PDF objs that we'd like to ingest as books have 50+ pages, which would then need to be divided and converted from pdfs to tiffs. My brain hurts.

More than happy to bounce off ideas and do some testing with you. :)

Best,
Katie

DiegoPino · 2020-07-16T17:12:09Z

@kromabiles great. Following up here. A few questions about this:

You good with IMI extracting PDF into TIFFs? Or do you want IMI to use the same config/Book reader already uses?
How do we deal with page level metadata? I see we have two options (both could be implemented)
- You actually create the rows for each page, and i find some clever UI way of letting IMI know it should only fill the OBJ from the parent column PDF (extracted as TIFF). This also means if you add 10 instead of, e.g 100 pages that the PDF contains it would only ingest 10.
You add nothing. If so, then IMI will create the most basic Metadata for you, basically just the title and the page number.

Processing of this would need to actually happen during ingest (batch) or it would be just too slow... we need to test.

What is your largest PDF around there?

Secondly. I will also enable a Digital object with the same directly on play.archipelago.nyc so we can test performance and compare.

Thanks!

kromabiles · 2020-07-20T19:25:21Z

@DiegoPino Yes, extracting PDFs into TIFFs would be great. Our book collections don't have any page level metadata - all structured as single object description. :/

Our largest PDF is about 3GB and consists of 93 pages (yearbook).

Seeing Archipelago in action sounds exciting! :)

DiegoPino · 2020-07-20T19:54:21Z

Excellent. I will start planning. Will probably borrow book module settings, but i feel i should go TIFF first and the compress to JP2 if needed. I just tested a JP2 generated by islandora (core) and it was 25 Mbytes in size, same TIFF was 10 Mbytes which was a little bit annoying!

DiegoPino · 2020-08-20T15:32:48Z

@kromabiles sorry for the slowness, i have a solution! But requires some testing, planning. Give me the end of the week to enable in our sandbox and i give you credentials there. Will also copy your Templates and prepare a spreadsheet testcase, but even better if you have a few PDFs in a zip and a demo spreadsheet around

kromabiles · 2020-08-20T16:22:09Z

No worries! Thanks, Diego - files are too big to attach here, so I'll send them over to you via email.

DiegoPino self-assigned this Jul 16, 2020

DiegoPino added enhancement help wanted labels Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BookCModel and PDFs #119

BookCModel and PDFs #119

kromabiles commented Jul 14, 2020 •

edited

Loading

DiegoPino commented Jul 16, 2020

kromabiles commented Jul 20, 2020

DiegoPino commented Jul 20, 2020

DiegoPino commented Aug 20, 2020

kromabiles commented Aug 20, 2020

BookCModel and PDFs #119

BookCModel and PDFs #119

Comments

kromabiles commented Jul 14, 2020 • edited Loading

DiegoPino commented Jul 16, 2020

kromabiles commented Jul 20, 2020

DiegoPino commented Jul 20, 2020

DiegoPino commented Aug 20, 2020

kromabiles commented Aug 20, 2020

kromabiles commented Jul 14, 2020 •

edited

Loading