Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some unicode ranges incorrect #4

Open
lojjic opened this issue Sep 21, 2023 · 1 comment
Open

Some unicode ranges incorrect #4

lojjic opened this issue Sep 21, 2023 · 1 comment

Comments

@lojjic
Copy link
Owner

lojjic commented Sep 21, 2023

We mostly trust the unicode ranges defined in the Google Fonts CSS. But it appears some of those ranges are incorrect, including codepoints with no actual coverage in the font.

An example: Noto Sans SC subset 21 declares these ranges:

U+9f3d-9f3e, U+9f41, U+9f4a-9f4b, U+9f51-9f52, U+9f61-9f63, U+9f66-9f67, U+9f80-9f81, U+9f83, U+9f85-9f8d, U+9f90-9f91, U+9f94-9f96, U+9f98, U+9f9b-9f9c, U+9f9e, U+9fa0, U+9fa2, U+9ff4, U+a001, U+a007, U+a025, U+a046-a047, U+a057, U+a072, U+a078-a079, U+a083, U+a085, U+a100, U+a118, U+a132, U+a134, U+a1f4, U+a242, U+a4a6, U+a4aa, U+a4b0-a4b1, U+a4b3, U+a9c1-a9c2, U+ac00-ac01, U+ac04, U+ac08, U+ac10-ac11, U+ac13-ac16, U+ac19, U+ac1c-ac1d, U+ac24, U+ac70-ac71, U+ac74, U+ac77-ac78, U+ac80-ac81, U+ac83, U+ac8c, U+ac90, U+ac9f-aca0, U+aca8-aca9, U+acac, U+acb0, U+acbd, U+acc1, U+acc4, U+ace0-ace1, U+ace4, U+ace8, U+acf3, U+acf5, U+acfc-acfd, U+ad00, U+ad0c, U+ad11, U+ad1c, U+ad34, U+ad50, U+ad64, U+ad6c, U+ad70, U+ad74, U+ad7f, U+ad81, U+ad8c, U+adc0, U+adc8, U+addc, U+ade0, U+adf8-adf9, U+adfc, U+ae00, U+ae08-ae09, U+ae0b, U+ae30, U+ae34, U+ae38, U+ae40, U+ae4a, U+ae4c, U+ae54, U+ae68, U+aebc, U+aed8, U+af2c-af2d, U+af34

However the font only contains glyphs for the U+9xxx ranges, and all of the U+axxx ranges defined above (Hangul chars) appear to be incorrect.

We may need to modify the data build script to parse the real codepoints out of the woff files rather than trusting what GFonts gives us.

@lojjic
Copy link
Owner Author

lojjic commented Oct 2, 2023

Partial fix specifically for CJK which excludes Korean/Japanese ranges from Chinese fonts: b776496

This isn't comprehensive, it seems there are still plenty of incorrect ranges. In addition, it seems some fonts vary in their glyph coverage between sans/serif, which may mean we need to split the codepoint index by category.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant