Skip to content

Latest commit

 

History

History
408 lines (372 loc) · 36.5 KB

character-tables-myanmar.md

File metadata and controls

408 lines (372 loc) · 36.5 KB

Myanmar character tables

This document lists the per-character shaping information needed to shape Myanmar text.

Table of Contents

Myanmar character table

Myanmar glyphs should be classified as in the following table. Codepoints in the Myanmar block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+1000 Letter CONSONANT null က Ka
U+1001 Letter CONSONANT null ခ Kha
U+1002 Letter CONSONANT null ဂ Ga
U+1003 Letter CONSONANT null ဃ Gha
U+1004 Letter CONSONANT null င Nga
U+1005 Letter CONSONANT null စ Ca
U+1006 Letter CONSONANT null ဆ Cha
U+1007 Letter CONSONANT null ဇ Ja
U+1008 Letter CONSONANT null ဈ Jha
U+1009 Letter CONSONANT null ဉ Nya
U+100A Letter CONSONANT null ည Nnya
U+100B Letter CONSONANT null ဋ Tta
U+100C Letter CONSONANT null ဌ Ttha
U+100D Letter CONSONANT null ဍ Dda
U+100E Letter CONSONANT null ဎ DDha
U+100F Letter CONSONANT null ဏ Nna
U+1010 Letter CONSONANT null တ Ta
U+1011 Letter CONSONANT null ထ Tha
U+1012 Letter CONSONANT null ဒ Da
U+1013 Letter CONSONANT null ဓ Dha
U+1014 Letter CONSONANT null န Na
U+1015 Letter CONSONANT null ပ Pa
U+1016 Letter CONSONANT null ဖ Pha
U+1017 Letter CONSONANT null ဗ Ba
U+1018 Letter CONSONANT null ဘ Bha
U+1019 Letter CONSONANT null မ Ma
U+101A Letter CONSONANT null ယ Ya
U+101B Letter CONSONANT null ရ Ra
U+101C Letter CONSONANT null လ La
U+101D Letter CONSONANT null ဝ Wa
U+101E Letter CONSONANT null သ Sa
U+101F Letter CONSONANT null ဟ Ha
U+1020 Letter CONSONANT null ဠ Lla
U+1021 Letter VOWEL_INDEPENDENT null အ A
U+1022 Letter VOWEL_INDEPENDENT null ဢ Shan A
U+1023 Letter VOWEL_INDEPENDENT null ဣ I
U+1024 Letter VOWEL_INDEPENDENT null ဤ Ii
U+1025 Letter VOWEL_INDEPENDENT null ဥ U
U+1026 Letter VOWEL_INDEPENDENT null ဦ Uu
U+1027 Letter VOWEL_INDEPENDENT null ဧ E
U+1028 Letter VOWEL_INDEPENDENT null ဨ Mon E
U+1029 Letter VOWEL_INDEPENDENT null ဩ O
U+102A Letter VOWEL_INDEPENDENT null ဪ Au
U+102B Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ါ Sign Tall Aa
U+102C Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ာ Sign Aa
U+102D Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ိ Sign I
U+102E Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ီ Sign Ii
U+102F Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ု Sign U
U+1030 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ူ Sign Uu
U+1031 Mark [Mc] VOWEL_DEPENDENT LEFT_POSITION ေ Sign E
U+1032 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ဲ Sign Ai
U+1033 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ဳ Sign Mon Ii
U+1034 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ဴ Sign Mon O
U+1035 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ဵ Sign E Above
U+1036 Mark [Mn] BINDU TOP_POSITION ံ Anusvara
U+1037 Mark [Mn] TONE_MARKER BOTTOM_POSITION ့ Dot Below
U+1038 Mark [Mc] VISARGA RIGHT_POSITION း Visarga
U+1039 Mark [Mn] INVISIBLE_STACKER null ္ Virama
U+103A Mark [Mn] PURE_KILLER TOP_POSITION ် Asat
U+103B Mark [Mc] CONSONANT_MEDIAL RIGHT_POSITION ျ Sign Medial Ya
U+103C Mark [Mc] CONSONANT_MEDIAL TOP_LEFT_AND_BOTTOM_POSITION ြ Sign Medial Ra
U+103D Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION ွ Sign Medial Wa
U+103E Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION ှ Sign Medial Ha
U+103F Letter CONSONANT null ဿ Great Sa
U+1040 Number NUMBER null ၀ Digit Zero
U+1041 Number NUMBER null ၁ Digit One
U+1042 Number NUMBER null ၂ Digit Two
U+1043 Number NUMBER null ၃ Digit Three
U+1044 Number NUMBER null ၄ Digit Four
U+1045 Number NUMBER null ၅ Digit Five
U+1046 Number NUMBER null ၆ Digit Six
U+1047 Number NUMBER null ၇ Digit Seven
U+1048 Number NUMBER null ၈ Digit Eight
U+1049 Number NUMBER null ၉ Digit Nine
U+104A Punctuation null null ၊ Little Section
U+104B Punctuation null null ။ Section
U+104C Punctuation null null ၌ Locative
U+104D Punctuation null null ၍ Completed
U+104E Punctuation CONSONANT_PLACEHOLDER null ၎ Aforementioned
U+104F Punctuation null null ၏ Genitive
U+1050 Letter CONSONANT null ၐ Sha
U+1051 Letter CONSONANT null ၑ Ssa
U+1052 Letter VOWEL_INDEPENDENT null ၒ Vocalic R
U+1053 Letter VOWEL_INDEPENDENT null ၓ Vocalic Rr
U+1054 Letter VOWEL_INDEPENDENT null ၔ Vocalic L
U+1055 Letter VOWEL_INDEPENDENT null ၕ Vocalic Ll
U+1056 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ၖ Sign Vocalic R
U+1057 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ၗ Sign Vocalic Rr
U+1058 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ၘ Sign Vocalic L
U+1059 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ၙ Sign Vocalic Ll
U+105A Letter CONSONANT null ၚ Mon Nga
U+105B Letter CONSONANT null ၛ Mon Jha
U+105C Letter CONSONANT null ၜ Mon Bba
U+105D Letter CONSONANT null ၝ Mon Bbe
U+105E Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION ၞ Sign Mon Medial Na
U+105F Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION ၟ Sign Mon Medial Ma
U+1060 Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION ၠ Sign Mon Medial La
U+1061 Letter CONSONANT null ၡ Sgaw Karen Sha
U+1062 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ၢ Sign Sgaw Karen Eu
U+1063 Mark [Mc] TONE_MARKER RIGHT_POSITION ၣ Tone Sgaw Karen Hathi
U+1064 Mark [Mc] TONE_MARKER RIGHT_POSITION ၤ Tone Sgaw Karen Ke Pho
U+1065 Letter CONSONANT null ၥ Western Pwo Karen Tha
U+1066 Letter CONSONANT null ၦ Western Pwo Karen Pwa
U+1067 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ၧ Sign Western Pwo Karen Eu
U+1068 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ၨ Sign Western Pwo Karen Ue
U+1069 Mark [Mc] TONE_MARKER RIGHT_POSITION ၩ Sign Western Pwo Karen Tone 1
U+106A Mark [Mc] TONE_MARKER RIGHT_POSITION ၪ Sign Western Pwo Karen Tone 2
U+106B Mark [Mc] TONE_MARKER RIGHT_POSITION ၫ Sign Western Pwo Karen Tone 3
U+106C Mark [Mc] TONE_MARKER RIGHT_POSITION ၬ Sign Western Pwo Karen Tone 4
U+106D Mark [Mc] TONE_MARKER RIGHT_POSITION ၭ Sign Western Pwo Karen Tone 5
U+106E Letter CONSONANT null ၮ Eastern Pwo Karen Nna
U+106F Letter CONSONANT null ၯ Eastern Pwo Karen Ywa
U+1070 Letter CONSONANT null ၰ Eastern Pwo Karen Ghwa
U+1071 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ၱ Sign Geba Karen I
U+1072 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ၲ Sign Kayah Oe
U+1073 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ၳ Sign Kayah U
U+1074 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ၴ Sign Kayah Ee
U+1075 Letter CONSONANT null ၵ Shan Ka
U+1076 Letter CONSONANT null ၶ Shan Kha
U+1077 Letter CONSONANT null ၷ Shan Ga
U+1078 Letter CONSONANT null ၸ Shan Ca
U+1079 Letter CONSONANT null ၹ Shan Za
U+107A Letter CONSONANT null ၺ Shan Nya
U+107B Letter CONSONANT null ၻ Shan Da
U+107C Letter CONSONANT null ၼ Shan Na
U+107D Letter CONSONANT null ၽ Shan Pha
U+107E Letter CONSONANT null ၾ Shan Fa
U+107F Letter CONSONANT null ၿ Shan Ba
U+1080 Letter CONSONANT null ႀ Shan Tha
U+1081 Letter CONSONANT null ႁ Shan Ha
U+1082 Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION ႂ Sign Shan Medial Wa
U+1083 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ႃ Sign Shan Aa
U+1084 Mark [Mc] VOWEL_DEPENDENT LEFT_POSITION ႄ Sign Shan E
U+1085 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ႅ Sign Shan E Above
U+1086 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ႆ Sign Shan Final Y
U+1087 Mark [Mc] TONE_MARKER RIGHT_POSITION ႇ Sign Shan Tone 2
U+1088 Mark [Mc] TONE_MARKER RIGHT_POSITION ႈ Sign Shan Tone 3
U+1089 Mark [Mc] TONE_MARKER RIGHT_POSITION ႉ Sign Shan Tone 5
U+108A Mark [Mc] TONE_MARKER RIGHT_POSITION ႊ Sign Shan Tone 6
U+108B Mark [Mc] TONE_MARKER RIGHT_POSITION ႋ Sign Shan Council Tone 2
U+108C Mark [Mc] TONE_MARKER RIGHT_POSITION ႌ Sign Shan Council Tone 3
U+108D Mark [Mn] TONE_MARKER BOTTOM_POSITION ႍ Sign Shan Council Emphatic Tone
U+108E Letter CONSONANT null ႎ Rumai Palaung Fa
U+108F Mark [Mc] TONE_MARKER RIGHT_POSITION ႏ Sign Rumai Palaung Tone 5
U+1090 Number NUMBER null ႐ Shan Digit Zero
U+1091 Number NUMBER null ႑ Shan Digit One
U+1092 Number NUMBER null ႒ Shan Digit Two
U+1093 Number NUMBER null ႓ Shan Digit Three
U+1094 Number NUMBER null ႔ Shan Digit Four
U+1095 Number NUMBER null ႕ Shan Digit Five
U+1096 Number NUMBER null ႖ Shan Digit Six
U+1097 Number NUMBER null ႗ Shan Digit Seven
U+1098 Number NUMBER null ႘ Shan Digit Eight
U+1099 Number NUMBER null ႙ Shan Digit Nine
U+109A Mark [Mc] TONE_MARKER RIGHT_POSITION ႚ Sign Khamti Tone 1
U+109B Mark [Mc] TONE_MARKER RIGHT_POSITION ႛ Sign Khamti Tone 3
U+109C Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ႜ Sign Aiton A
U+109D Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ႝ Sign Aiton Ai
U+109E Symbol SYMBOL null ႞ Shan One
U+109F Symbol SYMBOL null ႟ Shan Exclamation

Myanmar Extended character tables

Myanmar Extended A character table

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+AA60 Letter CONSONANT null ꩠ Khamti Ga
U+AA61 Letter CONSONANT null ꩡ Khamti Ca
U+AA62 Letter CONSONANT null ꩢ Khamti Cha
U+AA63 Letter CONSONANT null ꩣ Khamti Ja
U+AA64 Letter CONSONANT null ꩤ Khamti Jha
U+AA65 Letter CONSONANT null ꩥ Khamti Nya
U+AA66 Letter CONSONANT null ꩦ Khamti Tta
U+AA67 Letter CONSONANT null ꩧ Khamti Ttha
U+AA68 Letter CONSONANT null ꩨ Khamti Dda
U+AA69 Letter CONSONANT null ꩩ Khamti Ddha
U+AA6A Letter CONSONANT null ꩪ Khamti Dha
U+AA6B Letter CONSONANT null ꩫ Khamti Na
U+AA6C Letter CONSONANT null ꩬ Khamti Sa
U+AA6D Letter CONSONANT null ꩭ Khamti Ha
U+AA6E Letter CONSONANT null ꩮ Khamti Hha
U+AA6F Letter CONSONANT null ꩯ Khamti Fa
U+AA70 Letter null null ꩰ Khamti Reduplication
U+AA71 Letter CONSONANT null ꩱ Khamti Xa
U+AA72 Letter CONSONANT null ꩲ Khamti Za
U+AA73 Letter CONSONANT null ꩳ Khamti Ra
U+AA74 Letter CONSONANT_PLACEHOLDER null ꩴ Khamti Oay
U+AA75 Letter CONSONANT_PLACEHOLDER null ꩵ Khamti Qn
U+AA76 Letter CONSONANT_PLACEHOLDER null ꩶ Khamti Hm
U+AA77 Symbol SYMBOL null ꩷ Khamti Aiton Exclamation
U+AA78 Symbol SYMBOL null ꩸ Khamti Aiton One
U+AA79 Symbol SYMBOL null ꩹ Khamti Aiton Two
U+AA7A Letter CONSONANT null ꩺ Khamti Aiton Ra
U+AA7B Mark [Mc] TONE_MARKER RIGHT_POSITION ꩻ Sign Pao Karen Tone
U+AA7C Mark [Mn] TONE_MARKER TOP_POSITION ꩼ Sign Tai Laing Tone 2
U+AA7D Mark [Mc] TONE_MARKER RIGHT_POSITION ꩽ Sign Tai Laing Tone 5
U+AA7E Letter CONSONANT null ꩾ Shwe Palaung Cha
U+AA7F Letter CONSONANT null ꩿ Shwe Palaung Sha

Myanmar Extended B character table

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+A9E0 Letter CONSONANT null ꧠ Shan Gha
U+A9E1 Letter CONSONANT null ꧡ Shan Cha
U+A9E2 Letter CONSONANT null ꧢ Shan Jha
U+A9E3 Letter CONSONANT null ꧣ Shan Nna
U+A9E4 Letter CONSONANT null ꧤ Shan Bha
U+A9E5 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ꧥ Sign Shan Saw
U+A9E6 Letter null null ꧦ Shan Reduplication
U+A9E7 Letter CONSONANT null ꧧ Tai Laing Nya
U+A9E8 Letter CONSONANT null ꧨ Tai Laing Fa
U+A9E9 Letter CONSONANT null ꧩ Tai Laing Ga
U+A9EA Letter CONSONANT null ꧪ Tai Laing Gha
U+A9EB Letter CONSONANT null ꧫ Tai Laing Ja
U+A9EC Letter CONSONANT null ꧬ Tai Laing Jha
U+A9ED Letter CONSONANT null ꧭ Tai Laing Dda
U+A9EE Letter CONSONANT null ꧮ Tai Laing Ddha
U+A9EF Letter CONSONANT null ꧯ Tai Laing Nna
U+A9F0 Number NUMBER null ꧰ Tai Laing Digit Zero
U+A9F1 Number NUMBER null ꧱ Tai Laing Digit One
U+A9F2 Number NUMBER null ꧲ Tai Laing Digit Two
U+A9F3 Number NUMBER null ꧳ Tai Laing Digit Three
U+A9F4 Number NUMBER null ꧴ Tai Laing Digit Four
U+A9F5 Number NUMBER null ꧵ Tai Laing Digit Five
U+A9F6 Number NUMBER null ꧶ Tai Laing Digit Six
U+A9F7 Number NUMBER null ꧷ Tai Laing Digit Seven
U+A9F8 Number NUMBER null ꧸ Tai Laing Digit Eight
U+A9F9 Number NUMBER null ꧹ Tai Laing Digit Nine
U+A9FA Letter CONSONANT null ꧺ Tai Laing Lla
U+A9FB Letter CONSONANT null ꧻ Tai Laing Da
U+A9FC Letter CONSONANT null ꧼ Tai Laing Dha
U+A9FD Letter CONSONANT null ꧽ Tai Laing Ba
U+A9FE Letter CONSONANT null ꧾ Tai Laing Bha
U+A9FF unassigned

Vedic Extensions character table

Sanskrit runs written in the Myanmar script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+1CD0 Mark [Mn] CANTILLATION TOP_POSITION ᳐ Tone Karshana
U+1CD1 Mark [Mn] CANTILLATION TOP_POSITION ᳑ Tone Shara
U+1CD2 Mark [Mn] CANTILLATION TOP_POSITION ᳒ Tone Prenkha
U+1CD3 Punctuation null null ᳓ Sign Nihshvasa
U+1CD4 Mark [Mn] CANTILLATION OVERSTRUCK ᳔ Tone Midline Svarita
U+1CD5 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳕ Tone Aggravated Independent Svarita
U+1CD6 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳖ Tone Independent Svarita
U+1CD7 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳗ Tone Kathaka Independent Svarita
U+1CD8 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳘ Tone Candra Below
U+1CD9 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳙ Tone Kathaka Independent Svarita Schroeder
U+1CDA Mark [Mn] CANTILLATION TOP_POSITION ᳚ Tone Double Svarita
U+1CDB Mark [Mn] CANTILLATION TOP_POSITION ᳛ Tone Triple Svarita
U+1CDC Mark [Mn] CANTILLATION BOTTOM_POSITION ᳜ Tone Kathaka Anudatta
U+1CDD Mark [Mn] CANTILLATION BOTTOM_POSITION ᳝ Tone Dot Below
U+1CDE Mark [Mn] CANTILLATION BOTTOM_POSITION ᳞ Tone Two Dots Below
U+1CDF Mark [Mn] CANTILLATION BOTTOM_POSITION ᳟ Tone Three Dots Below
U+1CE0 Mark [Mn] CANTILLATION TOP_POSITION ᳠ Tone Rigvedic Kashmiri Independent Svarita
U+1CE1 Mark [Mc] CANTILLATION RIGHT_POSITION ᳡ Tone Atharavedic Independent Svarita
U+1CE2 Mark [Mn] AVAGRAHA OVERSTRUCK ᳢ Sign Visarga Svarita
U+1CE3 Mark [Mn] null OVERSTRUCK ᳣ Sign Visarga Udatta
U+1CE4 Mark [Mn] null OVERSTRUCK ᳤ Sign Reversed Visarga Udatta
U+1CE5 Mark [Mn] null OVERSTRUCK ᳥ Sign Visarga Anudatta
U+1CE6 Mark [Mn] null OVERSTRUCK ᳦ Sign Reversed Visarga Anudatta
U+1CE7 Mark [Mn] null OVERSTRUCK ᳧ Sign Visarga Udatta With Tail
U+1CE8 Mark [Mn] AVAGRAHA OVERSTRUCK ᳨ Sign Visarga Anudatta With Tail
U+1CE9 Letter SYMBOL null ᳩ Sign Anusvara Antargomukha
U+1CEA Letter null null ᳪ Sign Anusvara Bahirgomukha
U+1CEB Letter null null ᳫ Sign Anusvara Vamagomukha
U+1CEC Letter SYMBOL null ᳬ Sign Anusvara Vamagomukha With Tail
U+1CED Mark [Mn] AVAGRAHA BOTTOM_POSITION ᳭ Sign Tiryak
U+1CEE Letter SYMBOL null ᳮ Sign Hexiform Long Anusvara
U+1CEF Letter null null ᳯ Sign Long Anusvara
U+1CF0 Letter null null ᳰ Sign Rthang Long Anusvara
U+1CF2 Letter CONSONANT_DEAD null ᳲ Sign Ardhavisarga
U+1CF3 Letter CONSONANT_DEAD null ᳳ Sign Rotated Ardhavisarga
U+1CF3 Mark [Mc] VISARGA null ᳳ Sign Rotated Ardhavisarga
U+1CF4 Mark [Mn] CANTILLATION TOP_POSITION ᳴ Tone Candra Above
U+1CF5 Letter CONSONANT_WITH_STACKER null ᳵ Sign Jihvamuliya
U+1CF6 Letter CONSONANT_WITH_STACKER null ᳶ Sign Upadhmaniya
U+1CF7 Mark [Mc] null null ᳷ Sign Atikrama
U+1CF8 Mark [Mn] CANTILLATION null ᳸ Tone Ring Above
U+1CF9 Mark [Mn] CANTILLATION null ᳹ Tone Double Ring Above
U+1CFA Letter PLACEHOLDER null ᳺ Sign Double Anusvara Antargomukha
U+1CFB unassigned
U+1CFC unassigned
U+1CFD unassigned
U+1CFE unassigned
U+1CFF unassigned

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Myanmar text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+00A0 Separator PLACEHOLDER null   No-break space
U+200C Other NON_JOINER null ‌ Zero-width non-joiner
U+200D Other JOINER null ‍ Zero-width joiner
U+2010 Punctuation PLACEHOLDER null ‐ Hyphen
U+2011 Punctuation PLACEHOLDER null ‑ No-break hyphen
U+2012 Punctuation PLACEHOLDER null ‒ Figure dash
U+2013 Punctuation PLACEHOLDER null – En dash
U+2014 Punctuation PLACEHOLDER null — Em dash
U+25CC Symbol DOTTED_CIRCLE null ◌ Dotted circle

The zero-width joiner is primarily used to prevent the formation of a conjunct from a "Consonant,Halant,Consonant" sequence. The sequence "Consonant,Halant,ZWJ,Consonant" blocks the formation of a conjunct between the two consonants.

Note, however, that the "Consonant,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner must be used instead. The sequence "Consonant,Halant,ZWNJ,Consonant" should produce the first consonant in its standard form, followed by an explicit "Halant".

A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would.

The no-break space is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,Consonant", "NBSP,mark", or "NBSP,matra".