Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Mirador viewer sends DSpace Authorization header #1436

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mspalti
Copy link
Member

@mspalti mspalti commented Dec 3, 2021

References

Description

Adds a request preprocessor function to Mirador viewer configuration. If a DSpace auth token is present, the function adds an authentication header to the request.

Instructions for Reviewers

List of changes in this PR:

  • Updated Mirador index.js

Include guidance for how to test or review your PR.
This is a minor change to viewer configuration only. To test, run yarn run build:mirador and open a restricted IIIF DSpace item as the authorized user.

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes TSLint validation using yarn run lint
  • My PR doesn't introduce circular dependencies
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • If my PR includes new, third-party dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.

@mspalti mspalti self-assigned this Dec 3, 2021
@tdonohue tdonohue changed the title Authorization header Ensure Mirador viewer sends DSpace Authorization header Dec 3, 2021
@tdonohue tdonohue added 1 APPROVAL pull request only requires a single approval to merge authorization related to authorization, permissions or groups bug integration: IIIF Related to International Image Interoperability Framework (IIIF) support labels Dec 3, 2021
@tdonohue tdonohue added this to the 7.2 milestone Dec 3, 2021
@tdonohue tdonohue self-requested a review December 9, 2021 15:31
@tdonohue tdonohue mentioned this pull request Dec 15, 2021
6 tasks
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mspalti : I'm not able to get this to work correctly. Is there some other configuration or setup that I'm missing?

Here's what I've tried.

  1. First, I'm using the Docker-based backend. So, I spin up the backend + IIIF server using docker-compose -p d7 -f docker-compose.yml -f dspace/src/main/docker-compose/docker-compose-iiif.yml up -d
  2. Then I build/ran this PR for the frontend (using yarn start)
  3. First, I setup an Item to support IIIF, adding an Image bistream. I verified that Mirador loads fine & I can zoom on the image, etc.
  4. Then I went into that Image bitstream and access restricted it to Administrators only.
  5. I logged in as an Administrator, and went to that Item's page. Mirador loads, but no image appears. In my DevTools Console, I see 403 errors like:
    GET http://localhost:8182/iiif/2/c7671d2f-c3a5-4d15-b570-27c7aedfe9b6/full/300,/0/default.jpg 403 (Forbidden)
    
  6. I've verified that as an Admin, I can download the access restricted image. It's just Mirador that isn't working because my IIIF server throws a 403 error.

Is there perhaps a special configuration in the IIIF server that I'm missing?

(As a sidenote, I've noticed that when trying to access this Item page anonymously, i.e. not logged in, Mirador still attempts to load. If the image is access restricted and you don't have permissions to access it, shouldn't we hide or not load Mirador?)

@mspalti
Copy link
Member Author

mspalti commented Dec 22, 2021

Hi @tdonohue , this is an interesting question. The preprocessing function adds the auth header to requests made from Mirador to the IIIF endpoint. I tested it by placing access restrictions on the item dso, not on individual bitstreams, and that part seems to work fine. Mirador is able create and retrieve a complete IIIF manifest from the item, bundle and bitstream metadata.

If you add the restriction to bitstreams (and not the item) we could return an error code from REST when no bitstreams are available and hide Mirador viewer when there's nothing to show. This might be something we add in 7.3+. It would require a bit of work on the REST and angular side but should be easy to do.

I'm pretty certain the 403 error you are seeing happens when the Cantaloupe image server tries to read the bitstream content. Obviously the request from Cantaloupe doesn't have the authorization header. At the moment, the only solution I've considered is configuring DSpace authorization for IP-based access to bitstreams using a special group.

@mspalti
Copy link
Member Author

mspalti commented Dec 23, 2021

@tdonohue , quick question. If we send the authorization header to the image server could we then pack the token into the image server request to dspace? Not sure that's allowed. If it can be done that would solve the 403 bitstream problem...

@tdonohue
Copy link
Member

tdonohue commented Dec 23, 2021

@mspalti : Essentially, yes. If the image server request to DSpace just forwarded (or copied) the same Authorization header, then things would likely work fine. The current problem seems like the Mirador viewer sends the Authorization header along (including to the image server), but the image server ignores it and the request to the DSpace backend is therefore unauthenticated. At least that's my best guess.

There's only one catch that I can think of. The Image server would likely need to be added to rest.cors.allowed-origins: https://github.com/DSpace/DSpace/blob/main/dspace/config/modules/rest.cfg#L11 Otherwise, it's still possible the REST backend will not trust the image server. But, it seems completely reasonable to me to require the Image server be trusted for everything to work.

@mspalti
Copy link
Member Author

mspalti commented Dec 23, 2021

Great! The function in this PR is not adding the header to requests to the image server but if I remove a constraint then it will be added. Cantaloupe supports a scripted strategy that we can probably use to add the authorization header to the DSpace request. I'll experiment with that after the holiday. Thanks!

@mspalti mspalti force-pushed the authorization-header branch from 53b9106 to 1623f65 Compare January 8, 2022 21:21
@mspalti
Copy link
Member Author

mspalti commented Jan 8, 2022

@tdonohue , this turned into a tricky problem, but I think I have some answers now.

Obviously it's easy to add the authorization header to DSpace IIIF API requests for manifests, annotationLists, etc. I think this is the important feature for folks in the digitization / cultural resources community (which will be the largest IIIF user group). For these users, I think the ability to add bitstream-level restrictions is a low-level concern but restricting access to items will be important.

For other user groups with more traditional IR needs and expectations bitstream-level access restrictions can be supported but it's a bit trickier. Here's what I've discovered so far:

  • You need to use a Cantaloupe delegate script with "pre_authorize "and "httpsource_resource_info" lifecycle methods.
  • The initial request from mirador needs a custom header that carries the auth token.
  • In my tests, the custom header was sent only in the initial Mirador request to the server and not subsequent requests from OpenSeadragon. However, if the image server and the angular UI server share the same cookie domain then the dsAuthInfo cookie is available in subsequent requests and can be used.
  • One catch is that the Cantaloupe server needs additional CORS configuration for the initial custom header to be allowed. That's not a big change. Once we verify things a bit more it would be an easy Cantaloupe issue/pr.

This all seems doable but a heavy lift in the near term. My thought is to add DSpace IIIF API authorization for 7.2 and create a follow up issue for the image-server bitstream work.

I added the preprocessing for Mirador bitstream authorization to this PR. Since it's fairly tentative we could take it out for now or we could leave it and note that it's a work-in-progress.

@mspalti
Copy link
Member Author

mspalti commented Jan 8, 2022

BTW, I actually don't understand why in my tests the dsAuthInfo cookie is not available in all requests to the image server. Or why the custom header was not added to all requests.

If there's a way to use cookies in all requests then bitstream access control just requires a bit of additional Cantaloupe configuration which could be described in documentation. If the custom header can be made to work in all cases that would remove the shared cookie domain requirement. It might be worth a closer look at how Mirador works. But at this point I don't know what we'd find.

@tdonohue tdonohue requested a review from abollini January 13, 2022 15:34
@mspalti
Copy link
Member Author

mspalti commented Jan 15, 2022

I tested with a production site on which dspace and the Cantaloupe image server run behind a reverse proxy. As expected, the dsAuthInfo cookie was available for all requests. I was able to use the cookie to pre_authorize requests and retrieve restricted bitstreams using a delegate file.

I'm going to remove the extra Mirador preprocessor for image server requests since it now seems both unnecessary and problematic.

@mspalti
Copy link
Member Author

mspalti commented Jan 17, 2022

I added a postprocessor to provide a more meaningful error message if the manifest contains to images. It would be better if this relied on an API response code. But for now, it works to check the json response.

Screen Shot 2022-01-17 at 1 14 52 PM

Screen Shot 2022-01-17 at 1 15 02 PM

@mspalti
Copy link
Member Author

mspalti commented Jan 19, 2022

Yes, I agree that the iiif authentication API will be necessary if we want to fully support access restriction at the bitstream level. I'm sure there are possible authentication use cases that DSpace will want to support eventually!

But I'm still sort of asking myself the philosophical question. Do we have any use cases at present that require this level of control over individual image content? To me there's a difference between use cases that arise out of IIIF community usage and the kinds of uses cases already supported in the default DSpace Item view. The IIIF integration is about the former not necessarily the latter.

4Science might be in the best position to know what advanced IIIF use cases are in demand right now. From my perspective it may be enough to just require all bitstreams added to an IIIF manifest have anonymous read access.

@mspalti
Copy link
Member Author

mspalti commented Jan 19, 2022

So as noted above I discovered another issue with restricting bitstreams. We cache manifests for better performance and currently have no way to manage the cache in response to varying user permissions. That might require some research.

@mspalti
Copy link
Member Author

mspalti commented Jan 19, 2022

I'm away for the day. Here's a quick summary of where I think we stand on the question of bitstream restrictions.

Basically, dspace bitstream and bundle restrictions do not work at all with iiif-enabled items because of our current approach to caching. If the caching issue is fixed, then it will be possible for local institutions to configure their image server to use the dspace authorization token. Institutions who do not configure their image server will need to avoid bundle and bitstream access restrictions since as @tdonohue discovered they produce viewer errors.

These statements are true only for the embedded Mirador viewer and not manifests that are shared with an external viewer. Full IIIF interoperability will require a dspace implementation of the IIIF Authentication API. That's beyond the current scope of our efforts but worth investigating.

Meanwhile, this PR does allow the embedded viewer to access Items that are restricted at the Item level. That's an important enhancement.

@tdonohue tdonohue requested review from tdonohue and abollini January 20, 2022 15:53
@abollini
Copy link
Member

My personal feeling is that this PR is not yet ready to go in an official release.

Our IIIF implementation assumes that each dspace item has at most a single IIIF manifest associated to it, this allow us to cache the response ignoring who have requested it. Security is still in place as it is verified before to access the cache.

Provide access to a IIIF manifest of a restricted Item has no real value if we are not able also to grant to the viewer access to the restricted bitstreams as usually the bitstreams have stronger restriction than the item. We wan't suggest to protect the item metadata leaving the bitstream open as a good security practice.

@mspalti you say that the caching issue is currently preventing us to manage restricted bitstream I'm not sure which is your idea here. In any case I would to discourage you to include any authorization token in the manifest document itself as the manifest document is sometime shared as a json file directly between researcher, uploaded in other systems or harvested by other system as well

@tdonohue
Copy link
Member

@mspalti and @abollini : As it sounds to me like there's still a lot to be figured out / discussed regarding this feature (especially with caching, etc), I'd recommend we simply reschedule this for 7.3 at this time. That means that 7.2 will just have the same behavior as 7.1, in that IIIF items are must have publicly accessible bitstreams

@tdonohue tdonohue removed this from the 7.2 milestone Jan 24, 2022
@mspalti
Copy link
Member Author

mspalti commented Jan 24, 2022

Yes, I agree that we are not ready to merge this one. Not enough time for consideration.

For the sake of further analysis:

@tdonohue , @abollini , there's no suggestion here that we include an authorization token in the manifest. The PR solves only one problem and in a way that's not intended to support IIIF interoperability. Maybe a concrete example will help make the intent clear. Say I'm an archivist and I've added a new IIIF enabled item to DSpace. I do not want it to be available to everyone (or interoperable) so I restrict access to the item. When I log in as an authorized user, I can see the DSpace item and the embedded Mirador viewer. But if the embedded viewer cannot provide the authorization token in its request to fetch the manifest, then the viewer fails and I can't see my stuff. That's frustrating. Letting Mirador add the authorization token to the request solves this problem. Only this problem and not the others we've been discussing. It also doesn't work if the manifest is used in some other system.

Actually, this is the only problem I considered when I suggested the solution. I still think it's a good one for this one (important) issue.

Bitstream restrictions and IIIF interoperability are both bigger problems as we've discovered. There are several issues with bitstream restrictions that I can see:

  • Bitstreams are requested by the image server. If bitstreams are restricted, then these requests require an authorization token. This PR can't address that problem since it's not a DSpace issue. That said, in the case of Cantaloupe it's fairly easy to do if the request includes a DSpace cookie. But it's up to the local institution to get things working in a secure way. DSpace can't support it directly. Strategies for local configuration will vary with image server used.
  • Caching is an issue (I think). Here's an example. An item contains 2 images. One anonymous read and one restricted. I access the item anonymously and the manifest is returned with a single image. But I want both, so I log in. The cached manifest is returned. Again with a single image. Because the manifest is cached I won't get access to the second image. The reverse could also happen, in which cases an anonymous user receives a cached manifest that contains a restricted image and sees a viewer error.
  • It's currently possible to return a manifest with zero canvases if all bitstreams are restricted. I think this is something we should disallow at the API level and return an error code that can be handled gracefully in the Mirador viewer. (I haven't checked to see what the standard says.)

It would great to have a solution to the bitstream problem. But that might require something like implementing the IIIF Authorization API. Well beyond the scope of this PR.

I'm not clear how important bitstream restrictions are in the context of our IIIF integration. I'm convinced they are important! But I don't have a sense of how important or the level of priority for future work.

@tdonohue
Copy link
Member

@mspalti : Was reminded of this work as we went through the 7.3 board in today's meeting, so I re-read your last comment here.

While I understand the goal you were trying to achieve, I think you are making an assumption that most people would place the access restrictions at the Item level rather than the Bitstream level in your described use case. If the archivist you described decided instead to create a public Item with a restricted Bitstream, then you'd see the same behaviors you describe even with this PR in place. The archivist would still be frustrated that they cannot see the Bitstream in the Mirador viewer.

So, my worry here is that we are assuming that all users will provide access restrictions only at the Item level (and only seeking to solve that smaller problem). If someone instead accidentally (or purposefully) restricts the Bitstream, they will be confused as to why the Mirador viewer suddenly doesn't work -- even if it works fine for a different restricted Item (where the restrictions are only at the Item level).

I think we all (you, @abollini and I) see that the current behavior is problematic. But, I'm hesitant to apply a fix that only works if Items are restricted, but won't work if Bitstreams are restricted. I'd rather us try and find a way to minimally get Item & Bitstream restrictions working.... and we can always follow up that work with a full implementation of IIIF Authentication API in a separate PR at a later time (if that's a much larger task)

For now, I'm going to flag this PR as a work in progress. If it helps though, I'm glad to try to find time to discuss this problem in more detail in a future meeting...or we can further brainstorm in this PR or the associated issue ticket.

@mspalti
Copy link
Member Author

mspalti commented Mar 22, 2022

Sorry I missed your last comment @tdonohue . I am totally fine with designating this a work in progress!

Based on my previous comments it seems we can handle bitstream access by passing the JWT token to the image service, and configuring the image service accordingly. The technical problem right now is that our cache system isn't able to return different versions of a manifest for users with different credentials. It's an all or nothing cache without a notion of tiered access. That can probably be remedied.

I agree that the Authentication API is a bigger problem and one that needs to be addressed eventually. A solution will depend in some ways on the issues we're discussing here.

@tdonohue tdonohue removed their request for review April 28, 2022 19:59
@mspalti
Copy link
Member Author

mspalti commented Jan 24, 2023

A quick addition to this conversation.

I needed access restrictions for a collection of licensed images and was able to test the mirador and cantaloupe configuration mentioned above. It works. The cantaloupe configuration is similar to the mirador config in this PR: it uses the dsAuthInfo cookie to set the JWT before making the request to DSpace. (If cantaloupe is using a cache as would be typical, it needs to be configured to always check DSpace before returning the cached image.)

As @abollini noted, this particular cookie-based approach works only for the embedded viewer and is not consistent with the IIIF protocol for authentication. Also, it requires configuration of the image server as well as the viewer. So I'm thinking of this as a configuration recipe that one can use in lieu of full support for the IIIF Authentication API (and perhaps in combination with it when/if it becomes available in DSpace). I wouldn't recommend modifying our default Mirador configuration file index.js for all the reasons discussed earlier.

There's still the problem of Item and Bitstreams with different permissions (say a public item with a public low-res image and a restricted high-res image) because we can have only one version of the Manifest in the backend cache, but that seems less common and not the primary use case anyway.

@tdonohue tdonohue removed the 1 APPROVAL pull request only requires a single approval to merge label Feb 27, 2023
@pnbecker
Copy link
Member

We are investigating if we can implement IIIF Authentication API into DSpace. This seems to be the desired solution for the underlying issue, for which this PR is a partial workaround. I'll create an issue about IIIF Authentication API as soon as we have first results to share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
authorization related to authorization, permissions or groups bug integration: IIIF Related to International Image Interoperability Framework (IIIF) support merge conflict needs discussion work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mirador cannot retrieve iiif data for restricted item.
4 participants