Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-English characters #18584

Closed
3 tasks done
arash77 opened this issue Jul 22, 2024 · 7 comments
Closed
3 tasks done

Non-English characters #18584

arash77 opened this issue Jul 22, 2024 · 7 comments
Assignees

Comments

@arash77
Copy link
Collaborator

arash77 commented Jul 22, 2024

Describe the bug
Non-English characters in Galaxy are not well handled, leading to several issues:

Galaxy Version and/or server at which you observed the bug
Galaxy Version: 24.1 2.dev0
Commit: (run git rev-parse HEAD if you run this Galaxy server)

Browser and Operating System
Operating System: Linux
Browser: Firefox

To Reproduce
Steps to reproduce the behavior:

  1. Type non-English characters into the history name and the dataset name
  2. Go to dataset details
  3. See that the name of the dataset in request_json is wrong
  4. Download the dataset also returns the wrong filename
  5. For history, in Export history to file, it will also contain the wrong name of the dataset in history_attrs.txt
  6. For history, in Share or Publish, it will not show the URL when there are no English characters in the history name
  7. For Collections, while downloading them the file name is not the same.

Expected behavior
It should show all of the right characters.

@hexylena
Copy link
Member

Type non-English characters into the history name and the dataset name

do you have a test case for this that can easily be copy/pasted to test?

@arash77
Copy link
Collaborator Author

arash77 commented Jul 22, 2024

Type non-English characters into the history name and the dataset name

do you have a test case for this that can easily be copy/pasted to test?

For example:

  • ä, ö, ü, ß
  • تست
  • 测试
  • テスト

@hexylena
Copy link
Member

Ah wow, not even RTL, oof 😢 (should test with emoji too 😅)

@jdavcs
Copy link
Member

jdavcs commented Nov 5, 2024

@arash77 can you, please, add a checkbox list to the issue description so we can keep track of which items have been fixed (e.g. #18986 fixes slug generation)?

@arash77
Copy link
Collaborator Author

arash77 commented Nov 6, 2024

@arash77 can you, please, add a checkbox list to the issue description so we can keep track of which items have been fixed (e.g. #18986 fixes slug generation)?

Thank you for mentioning that. I have changed the description.

@arash77
Copy link
Collaborator Author

arash77 commented Nov 19, 2024

For downloading files like datasets or collections, we use Content-Disposition: attachment; filename=file in the response header. However, as discussed #18583 (comment), the filename parameter cannot include UTF-8 characters because it follows older standards like RFC 2616, which only allowed Latin-1 (ISO-8859-1) encoding.

This limitation seems outdated since newer standards like RFC 6266 and RFC 5987 allow UTF-8 characters in filenames using the filename* parameter. Modern browsers support this feature, enabling proper handling of filenames with special characters or non-Latin alphabets.

Given this, should we update our implementation to support filename* with UTF-8 encoding, or is the current approach sufficient for our needs? Supporting UTF-8 would improve usability for international filenames but might require additional testing and updates to our API.

@mvdbeek
Copy link
Member

mvdbeek commented Nov 19, 2024

The limitation is not outdated, filename*=utf-8''%e2%82%ac%20rates is still encodable in latin-1. As such you can definitely add the additional filename* instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants