Skip to content

Commit

Permalink
Correct the remark about the REPLACEMENT CHARACTER in properties.rs
Browse files Browse the repository at this point in the history
  • Loading branch information
hsivonen committed Dec 11, 2024
1 parent 849c1c9 commit 1d9d656
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion components/normalizer/trie-value-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Bit 30: 1 iff applying NFC to the decomposition does not result in the character

The character is a starter (CCC == 0) that decomposes to itself: The 31 lower bits set to zero. (Bit 31 may be set to 1, but bit 30 cannot.)

REPLACEMENT CHARACTER: Bit 31 set to 1 and all others set to zero. This in an exception to the above item in order to allow catching UTF-8 errors as a side effect of a passthrough check. (This requires a special case in `properties.rs` to report U+FFFD as having the default decomposition rather than a singleton decomposition.)
REPLACEMENT CHARACTER: Bit 31 set to 1 and all others set to zero. This in an exception to the above item in order to allow catching UTF-8 errors as a side effect of a passthrough check. (This requires masking these bits off in `properties.rs` to report U+FFFD as having the default decomposition.)

The character is a non-starter (CCC != 0) that decomposes to itself: The highest bit is set to 1, the rest of the high half is set to zeros, the second-least-significant byte is 0xD8, and the least-significant byte is the CCC value.

Expand Down

0 comments on commit 1d9d656

Please sign in to comment.