Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎁 IiifPrint Enhancements to clean up split works as PDF fileset is deleted #288

Merged
merged 15 commits into from
Nov 15, 2023
Merged
24 changes: 24 additions & 0 deletions app/actors/iiif_print/actors/cleanup_file_sets_actor_decorator.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# frozen_string_literal: true

# override Hyrax to remove splitting upon work delete
module IiifPrint
module Actors
# Responsible for removing FileSets related to the given curation concern.
module CleanupFileSetsActorDecorator
# @param [Hyrax::Actors::Environment] env
# @return [Boolean] true if destroy was successful
def destroy(env)
file_sets = env.curation_concern.file_sets
file_sets.each do |file_set|
# we destroy the children before the file_set, because we need the parent relationship
IiifPrint::SplitPdfs::DestroyPdfChildWorksService.conditionally_destroy_spawned_children_of(
file_set: file_set,
work: env.curation_concern
)
end
# and now back to your regularly scheduled programming
super
end
end
end
end
13 changes: 12 additions & 1 deletion app/actors/iiif_print/actors/file_set_actor_decorator.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# frozen_string_literal: true

# override to add PDF splitting for file sets
# override to add PDF splitting for file sets and remove splitting upon fileset delete
module IiifPrint
module Actors
module FileSetActorDecorator
Expand Down Expand Up @@ -33,6 +33,17 @@ def attach_to_work(work, file_set_params = {})
def service
IiifPrint::SplitPdfs::ChildWorkCreationFromPdfService
end

# Clean up children when removing the fileset
def destroy
# we destroy the children before the file_set, because we need the parent relationship
IiifPrint::SplitPdfs::DestroyPdfChildWorksService.conditionally_destroy_spawned_children_of(
file_set: file_set,
work: file_set.parent
)
# and now back to your regularly scheduled programming
super
end
end
end
end
1 change: 1 addition & 0 deletions app/indexers/concerns/iiif_print/child_indexer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ def self.decorate_work_types!
def generate_solr_document
super.tap do |solr_doc|
solr_doc['is_child_bsi'] ||= object.is_child
solr_doc['split_from_pdf_id_tsi'] ||= object.split_from_pdf_id
solr_doc['is_page_of_ssim'] = iiif_print_lineage_service.ancestor_ids_for(object)

# Due to a long-standing hack in Hyrax, the file_set_ids_ssim contains both file_set_ids and
Expand Down
9 changes: 9 additions & 0 deletions app/models/concerns/iiif_print/set_child_flag.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ module RDF
class CustomIsChildTerm < Vocabulary('http://id.loc.gov/vocabulary/identifiers/')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe consider renaming this file since it's doing more than setting the child flag now. I suppose that would be a breaking change so we could also cut a new release?

Now that I say that, maybe that can be a TODO for later given the urgency 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol yeah I considered that but since it's included in EVERY repo that uses IiifPrint, it's def a breaking change which is why I left it. Plus it's still a type of child flag if you stretch hard enough. ;)

property 'is_child'
end

class FromPdfIdTerm < Vocabulary('http://id.loc.gov/vocabulary/identifiers/')
property 'split_from_pdf_id'
end
end

module IiifPrint
Expand All @@ -18,6 +22,11 @@ module SetChildFlag
multiple: false do |index|
index.as :stored_searchable
end
property :split_from_pdf_id,
predicate: ::RDF::FromPdfIdTerm.split_from_pdf_id,
multiple: false do |index|
index.as :stored_searchable
end
end

def set_children
Expand Down
1 change: 1 addition & 0 deletions app/models/concerns/iiif_print/solr/document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ module IiifPrint::Solr::Document
def self.decorate(base)
base.prepend(self)
base.send(:attribute, :is_child, Hyrax::SolrDocument::Metadata::Solr::String, 'is_child_bsi')
base.send(:attribute, :split_from_pdf_id, Hyrax::SolrDocument::Metadata::Solr::String, 'split_from_pdf_id_tsi')
base.send(:attribute, :digest, Hyrax::SolrDocument::Metadata::Solr::String, 'digest_ssim')

# @note These properties came from the newspaper_works gem. They are configurable.
Expand Down
3 changes: 3 additions & 0 deletions app/models/iiif_print/pending_relationship.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,8 @@ class PendingRelationship < ApplicationRecord
validates :parent_id, presence: true
validates :child_title, presence: true
validates :child_order, presence: true
validates :parent_model, presence: true
validates :child_model, presence: true
validates :file_id, presence: true
end
laritakr marked this conversation as resolved.
Show resolved Hide resolved
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
class AddModelDetailsToIiifPrintPendingRelationships < ActiveRecord::Migration[5.2]
def change
add_column :iiif_print_pending_relationships, :parent_model, :string
add_column :iiif_print_pending_relationships, :child_model, :string
add_column :iiif_print_pending_relationships, :file_id, :string
end
end
1 change: 1 addition & 0 deletions lib/iiif_print.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
require "iiif_print/split_pdfs/base_splitter"
require "iiif_print/split_pdfs/child_work_creation_from_pdf_service"
require "iiif_print/split_pdfs/derivative_rodeo_splitter"
require "iiif_print/split_pdfs/destroy_pdf_child_works_service"

module IiifPrint
extend ActiveSupport::Autoload
Expand Down
1 change: 1 addition & 0 deletions lib/iiif_print/engine.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ class Engine < ::Rails::Engine
::BlacklightIiifSearch::IiifSearchAnnotation.prepend(IiifPrint::BlacklightIiifSearch::AnnotationDecorator)
::BlacklightIiifSearch::IiifSearch.prepend(IiifPrint::IiifSearchDecorator)
Hyrax::Actors::FileSetActor.prepend(IiifPrint::Actors::FileSetActorDecorator)
Hyrax::Actors::CleanupFileSetsActor.prepend(IiifPrint::Actors::CleanupFileSetsActorDecorator)

Hyrax.config do |config|
config.callback.set(:after_create_fileset) do |file_set, user|
Expand Down
8 changes: 6 additions & 2 deletions lib/iiif_print/jobs/child_works_from_pdf_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ def split_pdf(original_pdf_path, user, child_model, pdf_file_set)
image_files = @parent_work.iiif_print_config.pdf_splitter_service.call(original_pdf_path, file_set: pdf_file_set)
return if image_files.blank?

@split_from_pdf_id = pdf_file_set.nil? ? nil : pdf_file_set.id
prepare_import_data(original_pdf_path, image_files, user)

# submit the job to create all the child works for one PDF
Expand All @@ -70,7 +71,7 @@ def split_pdf(original_pdf_path, user, child_model, pdf_file_set)
@child_work_titles,
{},
@uploaded_files,
attributes.merge!(model: child_model.to_s).with_indifferent_access,
attributes.merge!(model: child_model.to_s, split_from_pdf_id: @split_from_pdf_id).with_indifferent_access,
operation)
end
# rubocop:enable Metrics/ParameterLists
Expand All @@ -96,7 +97,10 @@ def prepare_import_data(original_pdf_path, image_files, user)
# save child work info to create the member relationships
PendingRelationship.create!(child_title: child_title,
parent_id: @parent_work.id,
child_order: child_title)
child_order: child_title,
parent_model: @parent_work.class,
child_model: @parent_work.iiif_print_config.pdf_split_child_model,
file_id: @split_from_pdf_id)

begin
# Clean up the temporary image path.
Expand Down
7 changes: 7 additions & 0 deletions lib/iiif_print/jobs/create_relationships_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,14 @@ def perform(parent_id:, parent_model:, child_model:, retries: 0, **)
if @number_of_failures.zero? && @number_of_successes == @pending_children.count
# remove pending relationships upon valid completion
@pending_children.each(&:destroy)
elsif @number_of_failures.zero? && @number_of_successes > @pending_children.count
# remove pending relationships but raise error that too many relationships formed
@pending_children.each(&:destroy)
raise "CreateRelationshipsJob for parent id: #{@parent_id} " \
"added #{@number_of_successes} children, " \
"expected #{@pending_children} children."
else
# report failures & keep pending relationships
raise "CreateRelationshipsJob failed for parent id: #{@parent_id} " \
"had #{@number_of_successes} successes & #{@number_of_failures} failures, " \
"with errors: #{@errors}. Wanted #{@pending_children} children."
Expand Down
33 changes: 33 additions & 0 deletions lib/iiif_print/split_pdfs/destroy_pdf_child_works_service.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# frozen_string_literal: true

module IiifPrint
module SplitPdfs
## Encapsulates logic for cleanup when the PDF is destroyed after pdf splitting into child works
class DestroyPdfChildWorksService
## @api public
# @param file_set [FileSet] What is the containing file set for the provided file.
# @param work [Hydra::PCDM::Work] Parent of the fileset being deleted
def self.conditionally_destroy_spawned_children_of(file_set:, work:)
child_model = work.try(:iiif_print_config)&.pdf_split_child_model
return unless child_model
return unless file_set.class.pdf_mime_types.include?(file_set.mime_type)

IiifPrint::PendingRelationship.where(parent_id: work.id, file_id: file_set.id).each(&:destroy)
destroy_spawned_children(model: child_model, file_set: file_set, work: work)
end

private_class_method def self.destroy_spawned_children(model:, file_set:, work:)
# look first for children by the file set id they were split from
children = model.where(split_from_pdf_id: file_set.id)
if children.blank?
# find works where file name and work `to_param` are both in the title
children = model.where(title: file_set.label).where(title: work.to_param)
end
return if children.blank?
children.each do |rcd|
rcd.destroy(eradicate: true)
end
end
end
end
end
23 changes: 21 additions & 2 deletions spec/iiif_print/jobs/create_relationships_job_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,30 @@ module IiifPrint::Jobs

let(:parent_model) { WorkWithIiifPrintConfig.to_s }
let(:child_model) { WorkWithIiifPrintConfig.to_s }
let(:file) { FileSet.new.tap { |fs| fs.save!(validate: false) } }
let(:parent_record) { WorkWithIiifPrintConfig.new(title: ['required title']) }
let(:child_record1) { WorkWithIiifPrintConfig.new(title: ["Child of #{parent_record.id} page 01"]) }
let(:child_record2) { WorkWithIiifPrintConfig.new(title: ["Child of #{parent_record.id} page 02"]) }
let(:pending_rel1) { IiifPrint::PendingRelationship.new(parent_id: parent_record.id, child_title: "Child of #{parent_record.id} page 01", child_order: "Child of #{parent_record.id} page 01") }
let(:pending_rel2) { IiifPrint::PendingRelationship.new(parent_id: parent_record.id, child_title: "Child of #{parent_record.id} page 02", child_order: "Child of #{parent_record.id} page 02") }
let(:pending_rel1) do
IiifPrint::PendingRelationship.new(
parent_id: parent_record.id,
child_title: "Child of #{parent_record.id} page 01",
child_order: "Child of #{parent_record.id} page 01",
parent_model: parent_model,
child_model: child_model,
file_id: file.id
)
end
let(:pending_rel2) do
IiifPrint::PendingRelationship.new(
parent_id: parent_record.id,
child_title: "Child of #{parent_record.id} page 02",
child_order: "Child of #{parent_record.id} page 02",
parent_model: parent_model,
child_model: child_model,
file_id: file.id
)
end

describe '#perform' do
before do
Expand Down
92 changes: 92 additions & 0 deletions spec/iiif_print/split_pdfs/destroy_pdf_child_works_service_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# frozen_string_literal: true

require 'spec_helper'

RSpec.describe IiifPrint::SplitPdfs::DestroyPdfChildWorksService do
let(:subject) { described_class.conditionally_destroy_spawned_children_of(file_set: fileset, work: work) }

let(:work) { WorkWithIiifPrintConfig.new(title: ['required title'], id: '123') }
let(:fileset) { FileSet.new.tap { |fs| fs.save!(validate: false) } }
let(:child_work) { WorkWithIiifPrintConfig.new(title: ["Child of #{work.id} file.pdf page 01"], id: '456', is_child: true) }
let(:pending_rel1) do
IiifPrint::PendingRelationship.new(
parent_id: work.id,
child_title: "Child of #{work.id} file.pdf page 01",
child_order: "Child of #{work.id} file.pdf page 01",
parent_model: WorkWithIiifPrintConfig,
child_model: WorkWithIiifPrintConfig,
file_id: fileset.id
)
end
let(:pending_rel2) do
IiifPrint::PendingRelationship.new(
parent_id: work.id,
child_title: "Child of #{work.id} another.pdf page 01",
child_order: "Child of #{work.id} another.pdf page 01",
parent_model: WorkWithIiifPrintConfig,
child_model: WorkWithIiifPrintConfig,
file_id: 'another'
)
end
# let(:uploaded_pdf_file) { create(:uploaded_pdf_file) }
# let(:uploaded_file_ids) { [uploaded_pdf_file.id] }

before do
allow(fileset).to receive(:parent).and_return(work)
allow(fileset).to receive(:label).and_return('file.pdf')
allow(fileset).to receive(:mime_type).and_return('application/pdf')
end

describe 'class' do
subject { described_class }

it { is_expected.to respond_to(:conditionally_destroy_spawned_children_of) }
it { is_expected.not_to respond_to(:destroy_spawned_children) }
end

describe '#conditionally_destroy_spawned_children_of' do
context 'with child works by fileset id' do
before do
allow(WorkWithIiifPrintConfig).to receive(:where).with(split_from_pdf_id: fileset.id).and_return([child_work])
end

it 'destroys the child works' do
expect(child_work).to receive(:destroy)
subject
end
end

context 'with child works by title' do
before do
allow(WorkWithIiifPrintConfig).to receive(:where).with(split_from_pdf_id: fileset.id).and_return([])
allow(WorkWithIiifPrintConfig).to receive(:where).and_return([child_work])
end

it 'destroys the child works' do
expect(child_work).to receive(:destroy)
subject
end
end

context 'when fileset is not a PDF mimetype' do
before do
allow(fileset).to receive(:mime_type).and_return('not_pdf')
end

it 'returns with no changes' do
expect(IiifPrint::PendingRelationship).not_to receive(:where)
end
end

context 'when IiifPrint::PendingRelationship records exist' do
before do
pending_rel1.save
pending_rel2.save
end

it 'deletes only records associated with the specific fileset PDF file' do
expect { subject }.to change(IiifPrint::PendingRelationship, :count).by(-1)
end
end
end
end
Loading