Some unicode ranges incorrect #4

lojjic · 2023-09-21T18:18:04Z

We mostly trust the unicode ranges defined in the Google Fonts CSS. But it appears some of those ranges are incorrect, including codepoints with no actual coverage in the font.

An example: Noto Sans SC subset 21 declares these ranges:

U+9f3d-9f3e, U+9f41, U+9f4a-9f4b, U+9f51-9f52, U+9f61-9f63, U+9f66-9f67, U+9f80-9f81, U+9f83, U+9f85-9f8d, U+9f90-9f91, U+9f94-9f96, U+9f98, U+9f9b-9f9c, U+9f9e, U+9fa0, U+9fa2, U+9ff4, U+a001, U+a007, U+a025, U+a046-a047, U+a057, U+a072, U+a078-a079, U+a083, U+a085, U+a100, U+a118, U+a132, U+a134, U+a1f4, U+a242, U+a4a6, U+a4aa, U+a4b0-a4b1, U+a4b3, U+a9c1-a9c2, U+ac00-ac01, U+ac04, U+ac08, U+ac10-ac11, U+ac13-ac16, U+ac19, U+ac1c-ac1d, U+ac24, U+ac70-ac71, U+ac74, U+ac77-ac78, U+ac80-ac81, U+ac83, U+ac8c, U+ac90, U+ac9f-aca0, U+aca8-aca9, U+acac, U+acb0, U+acbd, U+acc1, U+acc4, U+ace0-ace1, U+ace4, U+ace8, U+acf3, U+acf5, U+acfc-acfd, U+ad00, U+ad0c, U+ad11, U+ad1c, U+ad34, U+ad50, U+ad64, U+ad6c, U+ad70, U+ad74, U+ad7f, U+ad81, U+ad8c, U+adc0, U+adc8, U+addc, U+ade0, U+adf8-adf9, U+adfc, U+ae00, U+ae08-ae09, U+ae0b, U+ae30, U+ae34, U+ae38, U+ae40, U+ae4a, U+ae4c, U+ae54, U+ae68, U+aebc, U+aed8, U+af2c-af2d, U+af34

However the font only contains glyphs for the U+9xxx ranges, and all of the U+axxx ranges defined above (Hangul chars) appear to be incorrect.

We may need to modify the data build script to parse the real codepoints out of the woff files rather than trusting what GFonts gives us.

The text was updated successfully, but these errors were encountered:

lojjic · 2023-10-02T18:47:51Z

Partial fix specifically for CJK which excludes Korean/Japanese ranges from Chinese fonts: b776496

This isn't comprehensive, it seems there are still plenty of incorrect ranges. In addition, it seems some fonts vary in their glyph coverage between sans/serif, which may mean we need to split the codepoint index by category.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some unicode ranges incorrect #4

Some unicode ranges incorrect #4

lojjic commented Sep 21, 2023

lojjic commented Oct 2, 2023

Some unicode ranges incorrect #4

Some unicode ranges incorrect #4

Comments

lojjic commented Sep 21, 2023

lojjic commented Oct 2, 2023