Skip to content

Commit

Permalink
🐛 Ease how we determine preprocessed location
Browse files Browse the repository at this point in the history
Prior to this commit, we assumed the ancestor must have an AARK_ID.
However that is not always the case.  Which is confounding given the
data structure of the files; however such is the way of the world.

With this commit we fallback to the file_set's internal information to
attempt to find the file in the preprocessed location.  Namely if the
FileSet had an import_url, we'll use that to derive where it probably
went in SpaceStone.

Note: sniffing out where this file exists in SpaceStone is a very
fragile process.

Closes #289

Related to:

- #289
  • Loading branch information
jeremyf committed Nov 28, 2023
1 parent 50b2659 commit 39f57c4
Showing 1 changed file with 22 additions and 5 deletions.
27 changes: 22 additions & 5 deletions app/services/iiif_print/derivative_rodeo_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -141,30 +141,47 @@ def self.get_ancestor(filename: nil, file_set:)
# @param file_set [FileSet]
# @param filename [String]
# @return [String] the dirname (without any "/" we hope)
# rubocop:disable Metrics/AbcSize
# rubocop:disable Metrics/MethodLength
def self.derivative_rodeo_preprocessed_directory_for(file_set:, filename:)
# SpaceStone does not know about lineage; it makes assumptions based on the URL of the work.
# If we have an import_url, let's follow the same assumption that SpaceStone would make.
#
# NOTE: We're assuming that a page ripped from a PDF will not have an import_url. This may
# not be the case.
return file_set.import_url.split("/")[-2] if file_set&.import_url&.split("/")[-2]&.presence

ancestor, ancestor_type = get_ancestor(filename: filename, file_set: file_set)

# Why might we not have an ancestor? In the case of grandparent_for, we may not yet have run
# the create relationships job. We could sneak a peak in the table to maybe glean some insight.
# However, read further the `else` clause to see the novel approach.
#
# Why might the ancestor not respond (nor have) a configured
# parent_work_identifier_property_name? Because data is sloppy. And we're trying to "guess"
# how this data was written in SpaceStone; a non-trivial task.
#
# TODO: Perhaps we could use the original remote_url to sniff that out the space stone
# directory?
#
# rubocop:disable Style/GuardClause
if ancestor
if ancestor && ancestor.try(parent_work_identifier_property_name).presence
message = "#{self.class}.#{__method__} #{file_set.class} ID=#{file_set.id} and filename: #{filename.inspect}" \
"has #{ancestor_type} of #{ancestor.class} ID=#{ancestor.id}"
Rails.logger.info(message)
ancestor.public_send(parent_work_identifier_property_name) ||
raise("Expected #{ancestor.class} ID=#{ancestor.id} (#{ancestor_type} of #{file_set.class} ID=#{file_set.id}) " \
"to have a present #{parent_work_identifier_property_name.inspect}")
ancestor.public_send(parent_work_identifier_property_name)
else
# HACK: This makes critical assumptions about how we're creating the title for the file_set;
# but we don't have much to fall-back on. Consider making this a configurable function. Or
# perhaps this entire method should be more configurable.
# TODO: Revisit this implementation.
file_set.title.first.split(".").first ||
Array.wrap(file_set.title).first.split(".").first ||
raise("#{file_set.class} ID=#{file_set.id} has title #{file_set.title.first} from which we cannot infer information.")
end
# rubocop:enable Style/GuardClause
end
# rubocop:enable Metrics/MethodLength
# rubocop:enable Metrics/AbcSize

def initialize(file_set)
@file_set = file_set
Expand Down

0 comments on commit 39f57c4

Please sign in to comment.