This document lists the per-character shaping information needed to shape Myanmar text.
Table of Contents
- Myanmar character table
- Myanmar Extended character tables
- Vedic Extensions character table
- Miscellaneous character table
Myanmar glyphs should be classified as in the following table. Codepoints in the Myanmar block with no assigned meaning are designated as unassigned in the Unicode category column.
Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.
Note: the
NUMBER
andSYMBOL
Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.
The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.
Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+1000 |
Letter | CONSONANT | null | က Ka |
U+1001 |
Letter | CONSONANT | null | ခ Kha |
U+1002 |
Letter | CONSONANT | null | ဂ Ga |
U+1003 |
Letter | CONSONANT | null | ဃ Gha |
U+1004 |
Letter | CONSONANT | null | င Nga |
U+1005 |
Letter | CONSONANT | null | စ Ca |
U+1006 |
Letter | CONSONANT | null | ဆ Cha |
U+1007 |
Letter | CONSONANT | null | ဇ Ja |
U+1008 |
Letter | CONSONANT | null | ဈ Jha |
U+1009 |
Letter | CONSONANT | null | ဉ Nya |
U+100A |
Letter | CONSONANT | null | ည Nnya |
U+100B |
Letter | CONSONANT | null | ဋ Tta |
U+100C |
Letter | CONSONANT | null | ဌ Ttha |
U+100D |
Letter | CONSONANT | null | ဍ Dda |
U+100E |
Letter | CONSONANT | null | ဎ DDha |
U+100F |
Letter | CONSONANT | null | ဏ Nna |
U+1010 |
Letter | CONSONANT | null | တ Ta |
U+1011 |
Letter | CONSONANT | null | ထ Tha |
U+1012 |
Letter | CONSONANT | null | ဒ Da |
U+1013 |
Letter | CONSONANT | null | ဓ Dha |
U+1014 |
Letter | CONSONANT | null | န Na |
U+1015 |
Letter | CONSONANT | null | ပ Pa |
U+1016 |
Letter | CONSONANT | null | ဖ Pha |
U+1017 |
Letter | CONSONANT | null | ဗ Ba |
U+1018 |
Letter | CONSONANT | null | ဘ Bha |
U+1019 |
Letter | CONSONANT | null | မ Ma |
U+101A |
Letter | CONSONANT | null | ယ Ya |
U+101B |
Letter | CONSONANT | null | ရ Ra |
U+101C |
Letter | CONSONANT | null | လ La |
U+101D |
Letter | CONSONANT | null | ဝ Wa |
U+101E |
Letter | CONSONANT | null | သ Sa |
U+101F |
Letter | CONSONANT | null | ဟ Ha |
U+1020 |
Letter | CONSONANT | null | ဠ Lla |
U+1021 |
Letter | VOWEL_INDEPENDENT | null | အ A |
U+1022 |
Letter | VOWEL_INDEPENDENT | null | ဢ Shan A |
U+1023 |
Letter | VOWEL_INDEPENDENT | null | ဣ I |
U+1024 |
Letter | VOWEL_INDEPENDENT | null | ဤ Ii |
U+1025 |
Letter | VOWEL_INDEPENDENT | null | ဥ U |
U+1026 |
Letter | VOWEL_INDEPENDENT | null | ဦ Uu |
U+1027 |
Letter | VOWEL_INDEPENDENT | null | ဧ E |
U+1028 |
Letter | VOWEL_INDEPENDENT | null | ဨ Mon E |
U+1029 |
Letter | VOWEL_INDEPENDENT | null | ဩ O |
U+102A |
Letter | VOWEL_INDEPENDENT | null | ဪ Au |
U+102B |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ါ Sign Tall Aa |
U+102C |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ာ Sign Aa |
U+102D |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ိ Sign I |
U+102E |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ီ Sign Ii |
U+102F |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ု Sign U |
U+1030 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ူ Sign Uu |
U+1031 |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ေ Sign E |
U+1032 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဲ Sign Ai |
U+1033 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဳ Sign Mon Ii |
U+1034 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဴ Sign Mon O |
U+1035 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဵ Sign E Above |
U+1036 |
Mark [Mn] | BINDU | TOP_POSITION | ံ Anusvara |
U+1037 |
Mark [Mn] | TONE_MARKER | BOTTOM_POSITION | ့ Dot Below |
U+1038 |
Mark [Mc] | VISARGA | RIGHT_POSITION | း Visarga |
U+1039 |
Mark [Mn] | INVISIBLE_STACKER | null | ္ Virama |
U+103A |
Mark [Mn] | PURE_KILLER | TOP_POSITION | ် Asat |
U+103B |
Mark [Mc] | CONSONANT_MEDIAL | RIGHT_POSITION | ျ Sign Medial Ya |
U+103C |
Mark [Mc] | CONSONANT_MEDIAL | TOP_LEFT_AND_BOTTOM_POSITION | ြ Sign Medial Ra |
U+103D |
Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ွ Sign Medial Wa |
U+103E |
Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ှ Sign Medial Ha |
U+103F |
Letter | CONSONANT | null | ဿ Great Sa |
U+1040 |
Number | NUMBER | null | ၀ Digit Zero |
U+1041 |
Number | NUMBER | null | ၁ Digit One |
U+1042 |
Number | NUMBER | null | ၂ Digit Two |
U+1043 |
Number | NUMBER | null | ၃ Digit Three |
U+1044 |
Number | NUMBER | null | ၄ Digit Four |
U+1045 |
Number | NUMBER | null | ၅ Digit Five |
U+1046 |
Number | NUMBER | null | ၆ Digit Six |
U+1047 |
Number | NUMBER | null | ၇ Digit Seven |
U+1048 |
Number | NUMBER | null | ၈ Digit Eight |
U+1049 |
Number | NUMBER | null | ၉ Digit Nine |
U+104A |
Punctuation | null | null | ၊ Little Section |
U+104B |
Punctuation | null | null | ။ Section |
U+104C |
Punctuation | null | null | ၌ Locative |
U+104D |
Punctuation | null | null | ၍ Completed |
U+104E |
Punctuation | CONSONANT_PLACEHOLDER | null | ၎ Aforementioned |
U+104F |
Punctuation | null | null | ၏ Genitive |
U+1050 |
Letter | CONSONANT | null | ၐ Sha |
U+1051 |
Letter | CONSONANT | null | ၑ Ssa |
U+1052 |
Letter | VOWEL_INDEPENDENT | null | ၒ Vocalic R |
U+1053 |
Letter | VOWEL_INDEPENDENT | null | ၓ Vocalic Rr |
U+1054 |
Letter | VOWEL_INDEPENDENT | null | ၔ Vocalic L |
U+1055 |
Letter | VOWEL_INDEPENDENT | null | ၕ Vocalic Ll |
U+1056 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၖ Sign Vocalic R |
U+1057 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၗ Sign Vocalic Rr |
U+1058 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ၘ Sign Vocalic L |
U+1059 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ၙ Sign Vocalic Ll |
U+105A |
Letter | CONSONANT | null | ၚ Mon Nga |
U+105B |
Letter | CONSONANT | null | ၛ Mon Jha |
U+105C |
Letter | CONSONANT | null | ၜ Mon Bba |
U+105D |
Letter | CONSONANT | null | ၝ Mon Bbe |
U+105E |
Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ၞ Sign Mon Medial Na |
U+105F |
Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ၟ Sign Mon Medial Ma |
U+1060 |
Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ၠ Sign Mon Medial La |
U+1061 |
Letter | CONSONANT | null | ၡ Sgaw Karen Sha |
U+1062 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၢ Sign Sgaw Karen Eu |
U+1063 |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၣ Tone Sgaw Karen Hathi |
U+1064 |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၤ Tone Sgaw Karen Ke Pho |
U+1065 |
Letter | CONSONANT | null | ၥ Western Pwo Karen Tha |
U+1066 |
Letter | CONSONANT | null | ၦ Western Pwo Karen Pwa |
U+1067 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၧ Sign Western Pwo Karen Eu |
U+1068 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၨ Sign Western Pwo Karen Ue |
U+1069 |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၩ Sign Western Pwo Karen Tone 1 |
U+106A |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၪ Sign Western Pwo Karen Tone 2 |
U+106B |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၫ Sign Western Pwo Karen Tone 3 |
U+106C |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၬ Sign Western Pwo Karen Tone 4 |
U+106D |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၭ Sign Western Pwo Karen Tone 5 |
U+106E |
Letter | CONSONANT | null | ၮ Eastern Pwo Karen Nna |
U+106F |
Letter | CONSONANT | null | ၯ Eastern Pwo Karen Ywa |
U+1070 |
Letter | CONSONANT | null | ၰ Eastern Pwo Karen Ghwa |
U+1071 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၱ Sign Geba Karen I |
U+1072 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၲ Sign Kayah Oe |
U+1073 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၳ Sign Kayah U |
U+1074 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၴ Sign Kayah Ee |
U+1075 |
Letter | CONSONANT | null | ၵ Shan Ka |
U+1076 |
Letter | CONSONANT | null | ၶ Shan Kha |
U+1077 |
Letter | CONSONANT | null | ၷ Shan Ga |
U+1078 |
Letter | CONSONANT | null | ၸ Shan Ca |
U+1079 |
Letter | CONSONANT | null | ၹ Shan Za |
U+107A |
Letter | CONSONANT | null | ၺ Shan Nya |
U+107B |
Letter | CONSONANT | null | ၻ Shan Da |
U+107C |
Letter | CONSONANT | null | ၼ Shan Na |
U+107D |
Letter | CONSONANT | null | ၽ Shan Pha |
U+107E |
Letter | CONSONANT | null | ၾ Shan Fa |
U+107F |
Letter | CONSONANT | null | ၿ Shan Ba |
U+1080 |
Letter | CONSONANT | null | ႀ Shan Tha |
U+1081 |
Letter | CONSONANT | null | ႁ Shan Ha |
U+1082 |
Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ႂ Sign Shan Medial Wa |
U+1083 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ႃ Sign Shan Aa |
U+1084 |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ႄ Sign Shan E |
U+1085 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ႅ Sign Shan E Above |
U+1086 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ႆ Sign Shan Final Y |
U+1087 |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႇ Sign Shan Tone 2 |
U+1088 |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႈ Sign Shan Tone 3 |
U+1089 |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႉ Sign Shan Tone 5 |
U+108A |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႊ Sign Shan Tone 6 |
U+108B |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႋ Sign Shan Council Tone 2 |
U+108C |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႌ Sign Shan Council Tone 3 |
U+108D |
Mark [Mn] | TONE_MARKER | BOTTOM_POSITION | ႍ Sign Shan Council Emphatic Tone |
U+108E |
Letter | CONSONANT | null | ႎ Rumai Palaung Fa |
U+108F |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႏ Sign Rumai Palaung Tone 5 |
U+1090 |
Number | NUMBER | null | ႐ Shan Digit Zero |
U+1091 |
Number | NUMBER | null | ႑ Shan Digit One |
U+1092 |
Number | NUMBER | null | ႒ Shan Digit Two |
U+1093 |
Number | NUMBER | null | ႓ Shan Digit Three |
U+1094 |
Number | NUMBER | null | ႔ Shan Digit Four |
U+1095 |
Number | NUMBER | null | ႕ Shan Digit Five |
U+1096 |
Number | NUMBER | null | ႖ Shan Digit Six |
U+1097 |
Number | NUMBER | null | ႗ Shan Digit Seven |
U+1098 |
Number | NUMBER | null | ႘ Shan Digit Eight |
U+1099 |
Number | NUMBER | null | ႙ Shan Digit Nine |
U+109A |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႚ Sign Khamti Tone 1 |
U+109B |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႛ Sign Khamti Tone 3 |
U+109C |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ႜ Sign Aiton A |
U+109D |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ႝ Sign Aiton Ai |
U+109E |
Symbol | SYMBOL | null | ႞ Shan One |
U+109F |
Symbol | SYMBOL | null | ႟ Shan Exclamation |
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+AA60 |
Letter | CONSONANT | null | ꩠ Khamti Ga |
U+AA61 |
Letter | CONSONANT | null | ꩡ Khamti Ca |
U+AA62 |
Letter | CONSONANT | null | ꩢ Khamti Cha |
U+AA63 |
Letter | CONSONANT | null | ꩣ Khamti Ja |
U+AA64 |
Letter | CONSONANT | null | ꩤ Khamti Jha |
U+AA65 |
Letter | CONSONANT | null | ꩥ Khamti Nya |
U+AA66 |
Letter | CONSONANT | null | ꩦ Khamti Tta |
U+AA67 |
Letter | CONSONANT | null | ꩧ Khamti Ttha |
U+AA68 |
Letter | CONSONANT | null | ꩨ Khamti Dda |
U+AA69 |
Letter | CONSONANT | null | ꩩ Khamti Ddha |
U+AA6A |
Letter | CONSONANT | null | ꩪ Khamti Dha |
U+AA6B |
Letter | CONSONANT | null | ꩫ Khamti Na |
U+AA6C |
Letter | CONSONANT | null | ꩬ Khamti Sa |
U+AA6D |
Letter | CONSONANT | null | ꩭ Khamti Ha |
U+AA6E |
Letter | CONSONANT | null | ꩮ Khamti Hha |
U+AA6F |
Letter | CONSONANT | null | ꩯ Khamti Fa |
U+AA70 |
Letter | null | null | ꩰ Khamti Reduplication |
U+AA71 |
Letter | CONSONANT | null | ꩱ Khamti Xa |
U+AA72 |
Letter | CONSONANT | null | ꩲ Khamti Za |
U+AA73 |
Letter | CONSONANT | null | ꩳ Khamti Ra |
U+AA74 |
Letter | CONSONANT_PLACEHOLDER | null | ꩴ Khamti Oay |
U+AA75 |
Letter | CONSONANT_PLACEHOLDER | null | ꩵ Khamti Qn |
U+AA76 |
Letter | CONSONANT_PLACEHOLDER | null | ꩶ Khamti Hm |
U+AA77 |
Symbol | SYMBOL | null | ꩷ Khamti Aiton Exclamation |
U+AA78 |
Symbol | SYMBOL | null | ꩸ Khamti Aiton One |
U+AA79 |
Symbol | SYMBOL | null | ꩹ Khamti Aiton Two |
U+AA7A |
Letter | CONSONANT | null | ꩺ Khamti Aiton Ra |
U+AA7B |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ꩻ Sign Pao Karen Tone |
U+AA7C |
Mark [Mn] | TONE_MARKER | TOP_POSITION | ꩼ Sign Tai Laing Tone 2 |
U+AA7D |
Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ꩽ Sign Tai Laing Tone 5 |
U+AA7E |
Letter | CONSONANT | null | ꩾ Shwe Palaung Cha |
U+AA7F |
Letter | CONSONANT | null | ꩿ Shwe Palaung Sha |
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+A9E0 |
Letter | CONSONANT | null | ꧠ Shan Gha |
U+A9E1 |
Letter | CONSONANT | null | ꧡ Shan Cha |
U+A9E2 |
Letter | CONSONANT | null | ꧢ Shan Jha |
U+A9E3 |
Letter | CONSONANT | null | ꧣ Shan Nna |
U+A9E4 |
Letter | CONSONANT | null | ꧤ Shan Bha |
U+A9E5 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ꧥ Sign Shan Saw |
U+A9E6 |
Letter | null | null | ꧦ Shan Reduplication |
U+A9E7 |
Letter | CONSONANT | null | ꧧ Tai Laing Nya |
U+A9E8 |
Letter | CONSONANT | null | ꧨ Tai Laing Fa |
U+A9E9 |
Letter | CONSONANT | null | ꧩ Tai Laing Ga |
U+A9EA |
Letter | CONSONANT | null | ꧪ Tai Laing Gha |
U+A9EB |
Letter | CONSONANT | null | ꧫ Tai Laing Ja |
U+A9EC |
Letter | CONSONANT | null | ꧬ Tai Laing Jha |
U+A9ED |
Letter | CONSONANT | null | ꧭ Tai Laing Dda |
U+A9EE |
Letter | CONSONANT | null | ꧮ Tai Laing Ddha |
U+A9EF |
Letter | CONSONANT | null | ꧯ Tai Laing Nna |
U+A9F0 |
Number | NUMBER | null | ꧰ Tai Laing Digit Zero |
U+A9F1 |
Number | NUMBER | null | ꧱ Tai Laing Digit One |
U+A9F2 |
Number | NUMBER | null | ꧲ Tai Laing Digit Two |
U+A9F3 |
Number | NUMBER | null | ꧳ Tai Laing Digit Three |
U+A9F4 |
Number | NUMBER | null | ꧴ Tai Laing Digit Four |
U+A9F5 |
Number | NUMBER | null | ꧵ Tai Laing Digit Five |
U+A9F6 |
Number | NUMBER | null | ꧶ Tai Laing Digit Six |
U+A9F7 |
Number | NUMBER | null | ꧷ Tai Laing Digit Seven |
U+A9F8 |
Number | NUMBER | null | ꧸ Tai Laing Digit Eight |
U+A9F9 |
Number | NUMBER | null | ꧹ Tai Laing Digit Nine |
U+A9FA |
Letter | CONSONANT | null | ꧺ Tai Laing Lla |
U+A9FB |
Letter | CONSONANT | null | ꧻ Tai Laing Da |
U+A9FC |
Letter | CONSONANT | null | ꧼ Tai Laing Dha |
U+A9FD |
Letter | CONSONANT | null | ꧽ Tai Laing Ba |
U+A9FE |
Letter | CONSONANT | null | ꧾ Tai Laing Bha |
U+A9FF |
unassigned |
Sanskrit runs written in the Myanmar script may also include characters from the Vedic Extensions block. These characters should be classified as follows.
Note: See the Vedic Extensions document for additional information.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+1CD0 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
U+1CD1 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
U+1CD2 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
U+1CD3 |
Punctuation | null | null | ᳓ Sign Nihshvasa |
U+1CD4 |
Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
U+1CD5 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
U+1CD6 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
U+1CD7 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
U+1CD8 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
U+1CD9 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
U+1CDA |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
U+1CDB |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
U+1CDC |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
U+1CDD |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
U+1CDE |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
U+1CDF |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
U+1CE0 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
U+1CE1 |
Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
U+1CE2 |
Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
U+1CE3 |
Mark [Mn] | null | OVERSTRUCK | ᳣ Sign Visarga Udatta |
U+1CE4 |
Mark [Mn] | null | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
U+1CE5 |
Mark [Mn] | null | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
U+1CE6 |
Mark [Mn] | null | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
U+1CE7 |
Mark [Mn] | null | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
U+1CE8 |
Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
U+1CE9 |
Letter | SYMBOL | null | ᳩ Sign Anusvara Antargomukha |
U+1CEA |
Letter | null | null | ᳪ Sign Anusvara Bahirgomukha |
U+1CEB |
Letter | null | null | ᳫ Sign Anusvara Vamagomukha |
U+1CEC |
Letter | SYMBOL | null | ᳬ Sign Anusvara Vamagomukha With Tail |
U+1CED |
Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
U+1CEE |
Letter | SYMBOL | null | ᳮ Sign Hexiform Long Anusvara |
U+1CEF |
Letter | null | null | ᳯ Sign Long Anusvara |
U+1CF0 |
Letter | null | null | ᳰ Sign Rthang Long Anusvara |
U+1CF2 |
Letter | CONSONANT_DEAD | null | ᳲ Sign Ardhavisarga |
U+1CF3 |
Letter | CONSONANT_DEAD | null | ᳳ Sign Rotated Ardhavisarga |
U+1CF3 |
Mark [Mc] | VISARGA | null | ᳳ Sign Rotated Ardhavisarga |
U+1CF4 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
U+1CF5 |
Letter | CONSONANT_WITH_STACKER | null | ᳵ Sign Jihvamuliya |
U+1CF6 |
Letter | CONSONANT_WITH_STACKER | null | ᳶ Sign Upadhmaniya |
U+1CF7 |
Mark [Mc] | null | null | ᳷ Sign Atikrama |
U+1CF8 |
Mark [Mn] | CANTILLATION | null | ᳸ Tone Ring Above |
U+1CF9 |
Mark [Mn] | CANTILLATION | null | ᳹ Tone Double Ring Above |
U+1CFA |
Letter | PLACEHOLDER | null | ᳺ Sign Double Anusvara Antargomukha |
U+1CFB |
unassigned | |||
U+1CFC |
unassigned | |||
U+1CFD |
unassigned | |||
U+1CFE |
unassigned | |||
U+1CFF |
unassigned |
Other important characters that may be encountered when shaping runs
of Myanmar text include the dotted-circle placeholder (U+25CC
), the
zero-width joiner (U+200D
) and zero-width non-joiner (U+200C
), and
the no-break space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+00A0 |
Separator | PLACEHOLDER | null | No-break space |
U+200C |
Other | NON_JOINER | null | Zero-width non-joiner |
U+200D |
Other | JOINER | null | Zero-width joiner |
U+2010 |
Punctuation | PLACEHOLDER | null | ‐ Hyphen |
U+2011 |
Punctuation | PLACEHOLDER | null | ‑ No-break hyphen |
U+2012 |
Punctuation | PLACEHOLDER | null | ‒ Figure dash |
U+2013 |
Punctuation | PLACEHOLDER | null | – En dash |
U+2014 |
Punctuation | PLACEHOLDER | null | — Em dash |
U+25CC |
Symbol | DOTTED_CIRCLE | null | ◌ Dotted circle |
The zero-width joiner is primarily used to prevent the formation of a conjunct from a "Consonant,Halant,Consonant" sequence. The sequence "Consonant,Halant,ZWJ,Consonant" blocks the formation of a conjunct between the two consonants.
Note, however, that the "Consonant,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner must be used instead. The sequence "Consonant,Halant,ZWNJ,Consonant" should produce the first consonant in its standard form, followed by an explicit "Halant".
A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would.
The no-break space is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,Consonant", "NBSP,mark", or "NBSP,matra".