-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Mirador viewer sends DSpace Authorization header #1436
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mspalti : I'm not able to get this to work correctly. Is there some other configuration or setup that I'm missing?
Here's what I've tried.
- First, I'm using the Docker-based backend. So, I spin up the backend + IIIF server using
docker-compose -p d7 -f docker-compose.yml -f dspace/src/main/docker-compose/docker-compose-iiif.yml up -d
- Then I build/ran this PR for the frontend (using
yarn start
) - First, I setup an Item to support IIIF, adding an Image bistream. I verified that Mirador loads fine & I can zoom on the image, etc.
- Then I went into that Image bitstream and access restricted it to Administrators only.
- I logged in as an Administrator, and went to that Item's page. Mirador loads, but no image appears. In my DevTools Console, I see 403 errors like:
GET http://localhost:8182/iiif/2/c7671d2f-c3a5-4d15-b570-27c7aedfe9b6/full/300,/0/default.jpg 403 (Forbidden)
- I've verified that as an Admin, I can download the access restricted image. It's just Mirador that isn't working because my IIIF server throws a 403 error.
Is there perhaps a special configuration in the IIIF server that I'm missing?
(As a sidenote, I've noticed that when trying to access this Item page anonymously, i.e. not logged in, Mirador still attempts to load. If the image is access restricted and you don't have permissions to access it, shouldn't we hide or not load Mirador?)
Hi @tdonohue , this is an interesting question. The preprocessing function adds the auth header to requests made from Mirador to the IIIF endpoint. I tested it by placing access restrictions on the item dso, not on individual bitstreams, and that part seems to work fine. Mirador is able create and retrieve a complete IIIF manifest from the item, bundle and bitstream metadata. If you add the restriction to bitstreams (and not the item) we could return an error code from REST when no bitstreams are available and hide Mirador viewer when there's nothing to show. This might be something we add in 7.3+. It would require a bit of work on the REST and angular side but should be easy to do. I'm pretty certain the 403 error you are seeing happens when the Cantaloupe image server tries to read the bitstream content. Obviously the request from Cantaloupe doesn't have the authorization header. At the moment, the only solution I've considered is configuring DSpace authorization for IP-based access to bitstreams using a special group. |
@tdonohue , quick question. If we send the authorization header to the image server could we then pack the token into the image server request to dspace? Not sure that's allowed. If it can be done that would solve the 403 bitstream problem... |
@mspalti : Essentially, yes. If the image server request to DSpace just forwarded (or copied) the same Authorization header, then things would likely work fine. The current problem seems like the Mirador viewer sends the Authorization header along (including to the image server), but the image server ignores it and the request to the DSpace backend is therefore unauthenticated. At least that's my best guess. There's only one catch that I can think of. The Image server would likely need to be added to |
Great! The function in this PR is not adding the header to requests to the image server but if I remove a constraint then it will be added. Cantaloupe supports a scripted strategy that we can probably use to add the authorization header to the DSpace request. I'll experiment with that after the holiday. Thanks! |
53b9106
to
1623f65
Compare
@tdonohue , this turned into a tricky problem, but I think I have some answers now. Obviously it's easy to add the authorization header to DSpace IIIF API requests for manifests, annotationLists, etc. I think this is the important feature for folks in the digitization / cultural resources community (which will be the largest IIIF user group). For these users, I think the ability to add bitstream-level restrictions is a low-level concern but restricting access to items will be important. For other user groups with more traditional IR needs and expectations bitstream-level access restrictions can be supported but it's a bit trickier. Here's what I've discovered so far:
This all seems doable but a heavy lift in the near term. My thought is to add DSpace IIIF API authorization for 7.2 and create a follow up issue for the image-server bitstream work. I added the preprocessing for Mirador bitstream authorization to this PR. Since it's fairly tentative we could take it out for now or we could leave it and note that it's a work-in-progress. |
BTW, I actually don't understand why in my tests the dsAuthInfo cookie is not available in all requests to the image server. Or why the custom header was not added to all requests. If there's a way to use cookies in all requests then bitstream access control just requires a bit of additional Cantaloupe configuration which could be described in documentation. If the custom header can be made to work in all cases that would remove the shared cookie domain requirement. It might be worth a closer look at how Mirador works. But at this point I don't know what we'd find. |
I tested with a production site on which dspace and the Cantaloupe image server run behind a reverse proxy. As expected, the I'm going to remove the extra Mirador preprocessor for image server requests since it now seems both unnecessary and problematic. |
Yes, I agree that the iiif authentication API will be necessary if we want to fully support access restriction at the bitstream level. I'm sure there are possible authentication use cases that DSpace will want to support eventually! But I'm still sort of asking myself the philosophical question. Do we have any use cases at present that require this level of control over individual image content? To me there's a difference between use cases that arise out of IIIF community usage and the kinds of uses cases already supported in the default DSpace Item view. The IIIF integration is about the former not necessarily the latter. 4Science might be in the best position to know what advanced IIIF use cases are in demand right now. From my perspective it may be enough to just require all bitstreams added to an IIIF manifest have anonymous read access. |
So as noted above I discovered another issue with restricting bitstreams. We cache manifests for better performance and currently have no way to manage the cache in response to varying user permissions. That might require some research. |
I'm away for the day. Here's a quick summary of where I think we stand on the question of bitstream restrictions. Basically, dspace bitstream and bundle restrictions do not work at all with iiif-enabled items because of our current approach to caching. If the caching issue is fixed, then it will be possible for local institutions to configure their image server to use the dspace authorization token. Institutions who do not configure their image server will need to avoid bundle and bitstream access restrictions since as @tdonohue discovered they produce viewer errors. These statements are true only for the embedded Mirador viewer and not manifests that are shared with an external viewer. Full IIIF interoperability will require a dspace implementation of the IIIF Authentication API. That's beyond the current scope of our efforts but worth investigating. Meanwhile, this PR does allow the embedded viewer to access Items that are restricted at the Item level. That's an important enhancement. |
My personal feeling is that this PR is not yet ready to go in an official release. Our IIIF implementation assumes that each dspace item has at most a single IIIF manifest associated to it, this allow us to cache the response ignoring who have requested it. Security is still in place as it is verified before to access the cache. Provide access to a IIIF manifest of a restricted Item has no real value if we are not able also to grant to the viewer access to the restricted bitstreams as usually the bitstreams have stronger restriction than the item. We wan't suggest to protect the item metadata leaving the bitstream open as a good security practice. @mspalti you say that the caching issue is currently preventing us to manage restricted bitstream I'm not sure which is your idea here. In any case I would to discourage you to include any authorization token in the manifest document itself as the manifest document is sometime shared as a json file directly between researcher, uploaded in other systems or harvested by other system as well |
@mspalti and @abollini : As it sounds to me like there's still a lot to be figured out / discussed regarding this feature (especially with caching, etc), I'd recommend we simply reschedule this for 7.3 at this time. That means that 7.2 will just have the same behavior as 7.1, in that IIIF items are must have publicly accessible bitstreams |
Yes, I agree that we are not ready to merge this one. Not enough time for consideration. For the sake of further analysis: @tdonohue , @abollini , there's no suggestion here that we include an authorization token in the manifest. The PR solves only one problem and in a way that's not intended to support IIIF interoperability. Maybe a concrete example will help make the intent clear. Say I'm an archivist and I've added a new IIIF enabled item to DSpace. I do not want it to be available to everyone (or interoperable) so I restrict access to the item. When I log in as an authorized user, I can see the DSpace item and the embedded Mirador viewer. But if the embedded viewer cannot provide the authorization token in its request to fetch the manifest, then the viewer fails and I can't see my stuff. That's frustrating. Letting Mirador add the authorization token to the request solves this problem. Only this problem and not the others we've been discussing. It also doesn't work if the manifest is used in some other system. Actually, this is the only problem I considered when I suggested the solution. I still think it's a good one for this one (important) issue. Bitstream restrictions and IIIF interoperability are both bigger problems as we've discovered. There are several issues with bitstream restrictions that I can see:
It would great to have a solution to the bitstream problem. But that might require something like implementing the IIIF Authorization API. Well beyond the scope of this PR. I'm not clear how important bitstream restrictions are in the context of our IIIF integration. I'm convinced they are important! But I don't have a sense of how important or the level of priority for future work. |
@mspalti : Was reminded of this work as we went through the 7.3 board in today's meeting, so I re-read your last comment here. While I understand the goal you were trying to achieve, I think you are making an assumption that most people would place the access restrictions at the Item level rather than the Bitstream level in your described use case. If the archivist you described decided instead to create a public Item with a restricted Bitstream, then you'd see the same behaviors you describe even with this PR in place. The archivist would still be frustrated that they cannot see the Bitstream in the Mirador viewer. So, my worry here is that we are assuming that all users will provide access restrictions only at the Item level (and only seeking to solve that smaller problem). If someone instead accidentally (or purposefully) restricts the Bitstream, they will be confused as to why the Mirador viewer suddenly doesn't work -- even if it works fine for a different restricted Item (where the restrictions are only at the Item level). I think we all (you, @abollini and I) see that the current behavior is problematic. But, I'm hesitant to apply a fix that only works if Items are restricted, but won't work if Bitstreams are restricted. I'd rather us try and find a way to minimally get Item & Bitstream restrictions working.... and we can always follow up that work with a full implementation of IIIF Authentication API in a separate PR at a later time (if that's a much larger task) For now, I'm going to flag this PR as a |
Sorry I missed your last comment @tdonohue . I am totally fine with designating this a work in progress! Based on my previous comments it seems we can handle bitstream access by passing the JWT token to the image service, and configuring the image service accordingly. The technical problem right now is that our cache system isn't able to return different versions of a manifest for users with different credentials. It's an all or nothing cache without a notion of tiered access. That can probably be remedied. I agree that the Authentication API is a bigger problem and one that needs to be addressed eventually. A solution will depend in some ways on the issues we're discussing here. |
A quick addition to this conversation. I needed access restrictions for a collection of licensed images and was able to test the mirador and cantaloupe configuration mentioned above. It works. The cantaloupe configuration is similar to the mirador config in this PR: it uses the dsAuthInfo cookie to set the JWT before making the request to DSpace. (If cantaloupe is using a cache as would be typical, it needs to be configured to always check DSpace before returning the cached image.) As @abollini noted, this particular cookie-based approach works only for the embedded viewer and is not consistent with the IIIF protocol for authentication. Also, it requires configuration of the image server as well as the viewer. So I'm thinking of this as a configuration recipe that one can use in lieu of full support for the IIIF Authentication API (and perhaps in combination with it when/if it becomes available in DSpace). I wouldn't recommend modifying our default Mirador configuration file There's still the problem of Item and Bitstreams with different permissions (say a public item with a public low-res image and a restricted high-res image) because we can have only one version of the Manifest in the backend cache, but that seems less common and not the primary use case anyway. |
We are investigating if we can implement IIIF Authentication API into DSpace. This seems to be the desired solution for the underlying issue, for which this PR is a partial workaround. I'll create an issue about IIIF Authentication API as soon as we have first results to share. |
References
Description
Adds a request preprocessor function to Mirador viewer configuration. If a DSpace auth token is present, the function adds an authentication header to the request.
Instructions for Reviewers
List of changes in this PR:
index.js
Include guidance for how to test or review your PR.
This is a minor change to viewer configuration only. To test, run
yarn run build:mirador
and open a restricted IIIF DSpace item as the authorized user.Checklist
This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!
yarn run lint
package.json
), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.