Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create Issue-level text derivative from Page-level batch ingest #203

Open
ebenenglish opened this issue Nov 7, 2019 · 0 comments
Open

Comments

@ebenenglish
Copy link
Collaborator

Background
In the Ingest Scenarios document, we indicated that an NDNP ingest would create NewspaperIssue objects, and compile child Page object files into a set of Issue-level files that would be attached as FileSets.

Similar to #155, a text file should be created representing the combined OCR text of all the component Pages.

The text should also be indexed as searchable full text in the issue Solr record, matching the current PDF ingest functionality. (This may already be happening?)

Rationale
An issue-level text file would facilitate easier text analysis.

Expected Behavior
As an admin user
When I run a batch ingest of page level files
And I view the results of the ingest in the UI
I should be able to view a page for each Issue in the batch
And I should be able to download the issue text as a plain TXT file
And the text should be searchable

This ticket will be complete when
Functionality and user story described above is implemented and tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant