From 39f57c49cb74f93adb0cfd759ee715cb12e0c267 Mon Sep 17 00:00:00 2001 From: Jeremy Friesen Date: Mon, 27 Nov 2023 09:49:54 -0500 Subject: [PATCH] =?UTF-8?q?=F0=9F=90=9B=20Ease=20how=20we=20determine=20pr?= =?UTF-8?q?eprocessed=20location?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prior to this commit, we assumed the ancestor must have an AARK_ID. However that is not always the case. Which is confounding given the data structure of the files; however such is the way of the world. With this commit we fallback to the file_set's internal information to attempt to find the file in the preprocessed location. Namely if the FileSet had an import_url, we'll use that to derive where it probably went in SpaceStone. Note: sniffing out where this file exists in SpaceStone is a very fragile process. Closes #289 Related to: - https://github.com/scientist-softserv/iiif_print/issues/289 --- .../iiif_print/derivative_rodeo_service.rb | 27 +++++++++++++++---- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/app/services/iiif_print/derivative_rodeo_service.rb b/app/services/iiif_print/derivative_rodeo_service.rb index 093982ff..8455bb1f 100644 --- a/app/services/iiif_print/derivative_rodeo_service.rb +++ b/app/services/iiif_print/derivative_rodeo_service.rb @@ -141,30 +141,47 @@ def self.get_ancestor(filename: nil, file_set:) # @param file_set [FileSet] # @param filename [String] # @return [String] the dirname (without any "/" we hope) + # rubocop:disable Metrics/AbcSize + # rubocop:disable Metrics/MethodLength def self.derivative_rodeo_preprocessed_directory_for(file_set:, filename:) + # SpaceStone does not know about lineage; it makes assumptions based on the URL of the work. + # If we have an import_url, let's follow the same assumption that SpaceStone would make. + # + # NOTE: We're assuming that a page ripped from a PDF will not have an import_url. This may + # not be the case. + return file_set.import_url.split("/")[-2] if file_set&.import_url&.split("/")[-2]&.presence + ancestor, ancestor_type = get_ancestor(filename: filename, file_set: file_set) # Why might we not have an ancestor? In the case of grandparent_for, we may not yet have run # the create relationships job. We could sneak a peak in the table to maybe glean some insight. # However, read further the `else` clause to see the novel approach. + # + # Why might the ancestor not respond (nor have) a configured + # parent_work_identifier_property_name? Because data is sloppy. And we're trying to "guess" + # how this data was written in SpaceStone; a non-trivial task. + # + # TODO: Perhaps we could use the original remote_url to sniff that out the space stone + # directory? + # # rubocop:disable Style/GuardClause - if ancestor + if ancestor && ancestor.try(parent_work_identifier_property_name).presence message = "#{self.class}.#{__method__} #{file_set.class} ID=#{file_set.id} and filename: #{filename.inspect}" \ "has #{ancestor_type} of #{ancestor.class} ID=#{ancestor.id}" Rails.logger.info(message) - ancestor.public_send(parent_work_identifier_property_name) || - raise("Expected #{ancestor.class} ID=#{ancestor.id} (#{ancestor_type} of #{file_set.class} ID=#{file_set.id}) " \ - "to have a present #{parent_work_identifier_property_name.inspect}") + ancestor.public_send(parent_work_identifier_property_name) else # HACK: This makes critical assumptions about how we're creating the title for the file_set; # but we don't have much to fall-back on. Consider making this a configurable function. Or # perhaps this entire method should be more configurable. # TODO: Revisit this implementation. - file_set.title.first.split(".").first || + Array.wrap(file_set.title).first.split(".").first || raise("#{file_set.class} ID=#{file_set.id} has title #{file_set.title.first} from which we cannot infer information.") end # rubocop:enable Style/GuardClause end + # rubocop:enable Metrics/MethodLength + # rubocop:enable Metrics/AbcSize def initialize(file_set) @file_set = file_set