-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitstream download URLs based on the handle of the item and the bitstream filename don't work with accented characters #2727
Comments
Atmire would like to claim this issue. Currently only these 2 legacy urls are supported:
But we found out that some of our clients also used this format |
@alexandrevryghem : I'm confused by your comment & this ticket description. This ticket description seems to say that the bug is with accented characters, but your comment implies it's not an issue with accented characters, but rather an issue of a missing legacy URL. I'm OK with Atmire claiming this & will assign to you. But, please update the ticket description/title if the bug is actually a missing legacy URL. That way testers will understand what your (eventual) PR is trying to fix. |
@tdonohue: I didn't create this ticket so I can't update it but like Alan mentioned it's not related to the accented characters. When you look at the example url that was given by @Marwa1988-S and the format he used |
I think this is the correct muster for my URL: As already described in my description: I'm not sure if this refers to an issue with legacy url or some configurations. |
Thnx for the additional info, this is indeed a separate issue. The correct url for downloading that bitstream is actually: Because the sequenceId of that bitstream is 3 (see here), but that also throws a 504 error |
What is the meaning of |
It's a number that ensured that the URLs were unique for each bitstream because it is possible for two bitstreams on a certain item to have the same name. It indeed seems to be more of a configuration problem (but I didn't check it myself). |
All, I dug into this a bit more today with help from the team hosting demo.dspace.org. It looks (to me) like it might be a DSpace UI bug with the redirect from a legacy URL to a new one.. For instance, I notice that downloading files with non-English characters works perfectly on the demo site, if you visit the Item page first and click on the filename.
However, if instead you use a direct link using the legacy URL style (from DSpace 6.x), then you will encounter this 504 error. My current guess is this ticket may be related to #1242, which added automatic redirects for these legacy (DSpace 6.x) URLs. It's possible that redirect isn't working properly when it encounters non-English characters. @alexandrevryghem : This analysis does mean that it might be possible that this ticket is loosely related to the legacy redirect issues you noticed with your clients. It's just that this ticket points out an additional problem...that the legacy redirects don't handle special characters well. Let me know if you'd like to claim this ticket or not. |
@tdonohue: I was able to reproduce this bug locally with the demo backend. This error is only reproducible when you run Angular in production mode and the fix is simply to encode the filename you send to the backend. I can create a small PR for this, but maybe it would be cleaner to fix these encoding issues higher up (in |
@alexandrevryghem : I have to admit, I don't have a strong preference here. Either fix is fine. So, I'd recommend using your best judgement....or you could ping @artlowel to see if he has a strong preference. |
@alexandrevryghem : Are you still working on this ticket? (Somehow it is no longer assigned to anyone) Just noting this bug came up in a discussion on dspace-tech: https://groups.google.com/g/dspace-tech/c/FO2cPiQWhlA |
@tdonohue: I can create fix for this if you want, I think that fixing it here for all the |
@alexandrevryghem : If you have time, that'd be great. I would however like to backport this to 7.x if possible. Even if it requires removing I'll assign this to you officially. |
@alexandrevryghem |
After #2725 was merged, this bug still exists. However, the behavior has changed slightly. This is what I'm seeing when I run in production mode (
So, after #2725, special characters in bitstreams no longer return an error. But, they still don't work properly if you attempt to access them via the It's possible however that fixing #2963 will also now fix this bug, as the new behavior almost seems like a redirect failure? Pinging @artlowel and @alexandrevryghem to let them know these two issues might now be related (unverified though). |
@tdonohue: I retested this myself today and for me the redirect now works correctly on sandbox, I think there might have been a confusion because there are 2 characters that look like a "c" with an accent but they have different unicodes so it may not look like it but: So currently:
With #2963:
|
@alexandrevryghem : I tried this again today, and it's still not working for me on Sandbox. I created a new test Item to test with: https://sandbox.dspace.org/handle/10673/1131 It has three files, one with no special characters and the other two having the names you specified above. Here's my results for
So, as you can see, on my end the |
@alexandrevryghem : A follow-up. I did notice it works perfectly in my browser now. If I copy either of these URLs into my browser window, then I briefly see the download page in the UI & I'm properly redirected to download the file. https://sandbox.dspace.org/bitstream/handle/10673/1131/1test_ć.pdf It's odd to me though that the So, this bug ticket appears to be partially fixed. These I'll try to test the fixes in #3062 later today to see if they have an impact on this & stop the 200 OK from being returned. |
@tdonohue: I just retested it now too, but for me it still returns 302 for me so that's weird 😅 → curl --head https://sandbox.dspace.org/bitstream/handle/10673/1131/test_pdf.pdf
HTTP/2 302
date: Mon, 20 May 2024 15:42:21 GMT
content-type: text/plain; charset=utf-8
content-length: 120
location: https://sandbox.dspace.org/server/api/core/bitstreams/83ebc316-dc6b-45f6-b667-b76ad99a818c/content
x-powered-by: Express
x-ratelimit-limit: 500
x-ratelimit-remaining: 499
x-ratelimit-reset: 1716219780
cache-control: max-age=604800
vary: Accept
→ curl --head https://sandbox.dspace.org/bitstream/handle/10673/1131/1test_ć.pdf
HTTP/2 302
date: Mon, 20 May 2024 15:42:32 GMT
content-type: text/plain; charset=utf-8
content-length: 120
location: https://sandbox.dspace.org/server/api/core/bitstreams/55021e94-455e-4eab-b1b8-eff9f552b882/content
x-powered-by: Express
x-ratelimit-limit: 500
x-ratelimit-remaining: 498
x-ratelimit-reset: 1716219780
cache-control: max-age=604800
link:
vary: Accept
→ curl --head https://sandbox.dspace.org/bitstream/handle/10673/1131/2test_ć.pdf
HTTP/2 302
date: Mon, 20 May 2024 15:42:41 GMT
content-type: text/plain; charset=utf-8
content-length: 120
location: https://sandbox.dspace.org/server/api/core/bitstreams/de373b4a-9959-4c3e-82f5-e855926050d9/content
x-powered-by: Express
x-ratelimit-limit: 500
x-ratelimit-remaining: 499
x-ratelimit-reset: 1716219780
cache-control: max-age=604800
link:
vary: Accept Maybe it has to do with our curl version, mine is on 8.6.0: → curl --version [17:42:41]
curl 8.6.0 (x86_64-apple-darwin23.0) libcurl/8.6.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.61.0
Release-Date: 2024-01-31
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz MultiSSL NTLM NTLM_WB SPNEGO SSL threadsafe UnixSocketsc |
@alexandrevryghem You may be correct that it's something with
In any case, I'm seeing your point that it appears this ticket might already be "solved". I've got some time in my afternoon to do some more testing, so I'll see if #3062 helps. Either way, if this is working for later versions of |
When trying to download a bitstream using the URL:
/bitstream/handle/[prefix]/[suffix]/[filename]
if the filename in the URL contain non-English characters, it redirects to 504 ERROR.
To Reproduce
Expected behavior
The bitstream file should be open
Try another file with just English characters in the filename: https://demo.dspace.org/bitstream/123456789/263/1/An_Interview_with_Sarah_Asebedo.pdf, it will works.
The text was updated successfully, but these errors were encountered: