-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance UTF-8 support for filename handling in downloads #19161
Conversation
…upport UTF-8 encoding
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: lib/galaxy/datatypes/data.py
Did you find this useful? React with a 👍 or 👎 |
lib/galaxy/datatypes/data.py
Outdated
dataset, | ||
to_ext, | ||
hdca=kwd.get("hdca"), | ||
element_identifier=kwd.get("element_identifier"), | ||
filename_pattern=kwd.get("filename_pattern"), | ||
) | ||
headers["Content-Disposition"] = f'attachment; filename="{filename}"' | ||
headers["Content-Disposition"] = ( | ||
f"attachment; filename=\"{filename}\"; filename*=UTF-8''{utf8_encoded_filename}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to a utility function to reduce duplication?
Something like def to_content_disposition(filename: str)
or something similar. The utf8 encoding should be inside too, so _download_filename
just returns filename
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
Should I remove the escape
from name
in _download_filename
?
Is it even necessary to escape the name anymore? Can we just use the quote function to escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not relevant anymore I guess so? it seems to be used in hdca_name
so check carefully if you can really drop it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the escape
function and also removed it from hdca_name
too, as it didn’t seem necessary since the quote function handles it effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The escape function limits input to 150 characters and excludes characters that aren't file system safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can limit the characters and I think that the quote
function already excludes these kinds of characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not, it just encodes them and the user-agent then knows how to turn them back into the original form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#filename, which unfortunately doesn't go into detail on restricted characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have to keep this but I will move this part into the new function.
…entDisposition function for improved UTF-8 support
…haracters are used
lib/galaxy/util/__init__.py
Outdated
|
||
|
||
def toContentDisposition(filename: str) -> str: | ||
sanitized_filename = "".join(c in FILENAME_VALID_CHARS and c or "_" for c in filename)[0:150] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will also need to apply the file length limit and exclude certain characters inutf8_encoded_filename
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the name should be to_content_disposition
or similar, sorry my brain was in frontend mode when I suggested the change 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, also the limit is applied to 150 characters shouldn't it be 255?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And by exclude certain characters
you mean which characters?
Should this be excluded? /\?%*:|"<>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any specific limitation on characters from the link but I found out that for a typical filename, the characters limitation is 256 characters and /\?%*:|"<>
are not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I bet most browsers are putting proper restrictions in place anyway.
apply 255 characters limit to filename exclude `/\?%*:|"<>` characters from encoded filename
Thanks @arash77! |
Fixes step 3 of #18584
Improve filename handling in data downloads and zipstream headers to support UTF-8 encoding, ensuring proper display of filenames with special characters.
How to test the changes?
(Select all options that apply)
License