Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Unicode 16 via utf8proc 2.10.0 #56925

Merged
merged 3 commits into from
Jan 2, 2025
Merged

support Unicode 16 via utf8proc 2.10.0 #56925

merged 3 commits into from
Jan 2, 2025

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Dec 31, 2024

Similar to #51799, support Unicode 16 by bumping utf8proc to 2.10.0 (thanks to JuliaStrings/utf8proc#277 by @eschnett).

This allows us to use 7 exciting new emoji characters as identifiers, including "face with bags under eyes"
image
"\U1fae9" (but still no superscript "q").

Closes #56035.

@stevengj stevengj added upstream The issue is with an upstream dependency, e.g. LLVM unicode Related to unicode characters and encodings labels Dec 31, 2024
Copy link
Member

@inkydragon inkydragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Close #56035

@eschnett
Copy link
Contributor

eschnett commented Jan 1, 2025

Before closing #56035, should some of the tables in stdlib/REPL be updated? There are emoji and latex tables there.

@inkydragon
Copy link
Member

inkydragon commented Jan 2, 2025

should some of the tables in stdlib/REPL be updated?

Yes.

# We combine multiple versions as the data changes, and not only by growing.
result = mapfoldr(emoji_data, merge, [
# Newer versions must be added to the bottom list as we want the newer versions to
# overwrite the old with names that changed but still keep old ones that were removed
"https://raw.githubusercontent.com/iamcal/emoji-data/0f0cf4ea8845eb52d26df2a48c3c31c3b8cad14e/emoji_pretty.json",
"https://raw.githubusercontent.com/iamcal/emoji-data/e512953312c012f6bd00e3f2ef6bf152ca3710f8/emoji_pretty.json",
"https://raw.githubusercontent.com/iamcal/emoji-data/a8174c74675355c8c6a9564516b2e961fe7257ef/emoji_pretty.json",
];

But upstream still use Unicode 15.
So maybe

@stevengj
Copy link
Member Author

stevengj commented Jan 2, 2025

I don't think the tables in stdlib/REPL should block this PR. Those tables are for tab completion, which is optional — most Unicode characters do not have tab completions.

Just because we process/parse new Unicode emoji doesn't mean we have to have tab completions for them — those can wait.

(And updating Unicode versions is about a lot more than allowing new emoji in identifiers! I mostly advertised the 7 new emoji as a joke.)

@inkydragon
Copy link
Member

Let's merge this and update the tables for REPL in a later pr

@inkydragon inkydragon merged commit 0741f9b into master Jan 2, 2025
10 checks passed
@inkydragon inkydragon deleted the unicode16 branch January 2, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
😃🍕 and other emoji external dependencies Involves LLVM, OpenBLAS, or other linked libraries unicode Related to unicode characters and encodings upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support latest Unicode 16.0
4 participants