keysyms: Add sharp S upper case mapping exception

The lower case mapping `U1E9E` ẞ → `ssharp` ß was added in 13b30f4 and then confirmed when we implemented the complete Unicode simple case mappings in e83d08d. However, the upper case mapping `ssharp` → `U1E9E` was not added in either commits, because ẞ is a relatively recent addition to Unicode (2008) and had no official recommendation, until recently. Since 2017 the Council for German Orthography (Rat für deutsche Rechtschreibung) recommends[^1] ẞ as the capitalization of ß. Due to its stability policies, the Unicode Character Database (UCD) that we use to generate our keysym case mapping (via ICU) cannot update the simple case mapping of ß. Discussions are currently ongoing in the Unicode mailing list[^2] and CLDR[^3] about how to deal with the new recommended case mapping. However, the discussions are oriented on text-processing and compatibility mappings, while libxkbcommon is on a rather lower level. It seems that the slow adoption of ẞ is partly due to the difficulty to type it. Since ẞ is used only for ALL CAPS casing, the expectation is to type it using CapsLock. While our detection of alphabetic key types works well for the pair (ß,ẞ) since the implementation of the complete Unicode case mappings, the *internal capitalization* currently does not work and is fixed by this commit. Added the ß → ẞ upper mapping: - Added an exception in the generation script - Fixed tests - Added documentation of the exceptions in `xkbcommon.h` [^1]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/ [^2]: https://corp.unicode.org/pipermail/unicode/2024-November/011162.html [^3]: https://unicode-org.atlassian.net/browse/CLDR-17624
xkbcommon · Dec 9, 2024 · c53e0a6 · c53e0a6
1 parent e0130e3
commit c53e0a6
Show file tree

Hide file tree

Showing 7 changed files with 355 additions and 324 deletions.
diff --git a/changes/api/+großes-ẞ.breaking.md b/changes/api/+großes-ẞ.breaking.md
@@ -0,0 +1,2 @@
+Added the upper case mapping ß → ẞ (`ssharp` → `U1E9E`). This enable to type
+ẞ using CapsLock thanks to the internal capitalization rules.
diff --git a/changes/api/+unicode-16.breaking.md b/changes/api/+unicode-16.breaking.md
@@ -5,13 +5,16 @@ the following:
 - `xkb_keysym_to_lower()` and `xkb_keysym_to_upper()` give different output
   for keysyms not covered previously and handle *title*-cased keysyms.
 
-  Example of title-cased keysym: `0x10001f2` (`U+01F2` “ǲ”):
-  - `xkb_keysym_to_lower(0x10001f2) == 0x10001f3` (`U+01F3` “ǳ”)
-  - `xkb_keysym_to_upper(0x10001f2) == 0x10001f1` (`U+01F1` “Ǳ”)
+  Example of title-cased keysym: `U01F2` “ǲ”:
+  - `xkb_keysym_to_lower(U01F2) == U01F3` “ǲ” → “ǳ”
+  - `xkb_keysym_to_upper(U01F2) == U01F1` “ǲ” → “Ǳ”
 - *Implicit* alphabetic key types are better detected, because they use the
   latest Unicode case mappings and now handle the *title*-cased keysyms the
   same way as upper-case ones.
 
+Note: There is a single *exception* that do not follow the Unicode mappings:
+- `xkb_keysym_to_upper(ssharp) == U1E9E` “ß” → “ẞ”
+
 Note: As before, only *simple* case mappings (i.e. one-to-one) are supported.
 For example, the full upper case of `U+01F0` “ǰ” is “J̌” (2 characters: `U+004A`
 and `U+030C`), which would require 2 keysyms, which is not supported by the

diff --git a/data/keysyms.yaml b/data/keysyms.yaml
@@ -560,6 +560,7 @@
 0x00df:
   name: ssharp
   code point: 0x00DF
+  upper: 0x1001e9e # U1E9E
 0x00e0:
   name: agrave
   code point: 0x00E0

diff --git a/include/xkbcommon/xkbcommon.h b/include/xkbcommon/xkbcommon.h
@@ -552,9 +552,18 @@ xkb_utf32_to_keysym(uint32_t ucs);
  * If there is no such form, the keysym is returned unchanged.
  *
  * The conversion rules are the *simple* (i.e. one-to-one) Unicode case
- * mappings and do not depend on the locale. If you need the special
- * case mappings (i.e. not one-to-one or locale-dependent), prefer to
- * work with the Unicode representation instead, when possible.
+ * mappings (with some exceptions, see hereinafter) and do not depend
+ * on the locale. If you need the special case mappings (i.e. not
+ * one-to-one or locale-dependent), prefer to work with the Unicode
+ * representation instead, when possible.
+ *
+ * Exceptions to the Unicode mappings:
+ *
+ * | Lower keysym | Lower letter | Upper keysym | Upper letter | Comment |
+ * | ------------ | ------------ | ------------ | ------------ | ------- |
+ * | `ssharp`     | `U+00DF`: ß  | `U1E9E`      | `U+1E9E`: ẞ  | [Council for German Orthography] |
+ *
+ * [Council for German Orthography]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/
  *
  * @since 0.8.0: Initial implementation, based on `libX11`.
  * @since 1.8.0: Use Unicode 16.0 mappings for complete Unicode coverage.

diff --git a/scripts/update-unicode.py b/scripts/update-unicode.py
@@ -90,6 +90,7 @@
 from pathlib import Path
 from typing import (
     Any,
+    ClassVar,
     Generator,
     Generic,
     Iterable,
@@ -294,6 +295,9 @@ class Entry:
     upper: int
     is_lower: bool
     is_upper: bool
+    # [NOTE] Exceptions must be documented in `xkbcommon.h`.
+    to_upper_exceptions: ClassVar[dict[str, str]] = {"ß": "ẞ"}
+    "Upper mappings exceptions"
 
     @classmethod
     def zeros(cls) -> Self:
@@ -326,16 +330,20 @@ def lower_delta(cls, cp: CodePoint) -> int:
     def upper_delta(cls, cp: CodePoint) -> int:
         return cp - cls.to_upper_cp(cp)
 
-    @staticmethod
-    def to_upper_cp(cp: CodePoint) -> CodePoint:
+    @classmethod
+    def to_upper_cp(cls, cp: CodePoint) -> CodePoint:
+        if upper := cls.to_upper_exceptions.get(chr(cp)):
+            return ord(upper)
         return icu.Char.toupper(cp)
 
     @staticmethod
     def to_lower_cp(cp: CodePoint) -> CodePoint:
         return icu.Char.tolower(cp)
 
-    @staticmethod
-    def to_upper_char(char: str) -> str:
+    @classmethod
+    def to_upper_char(cls, char: str) -> str:
+        if upper := cls.to_upper_exceptions.get(char):
+            return upper
         return icu.Char.toupper(char)
 
     @staticmethod
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Added the upper case mapping ß → ẞ (`ssharp` → `U1E9E`). This enable to type
		ẞ using CapsLock thanks to the internal capitalization rules.