This document lists the per-character shaping information needed to shape Mongolian text.
Table of Contents
Mongolian glyphs should be classified as in the following table. Codepoints in the Mongolian block with no assigned meaning are designated as unassigned in the Unicode category column.
The Joining type column indicates whether each codepoint is defined as joining with adjacent characters on the left side, right side, left and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints designated TRANSPARENT in the Joining type column do not join with adjacent characters and, in addition, do not affect the joining behavior of surrounding characters. Non-spacing marks are of type TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent characters to join.
The Joining group column lists the fundamental letter that the listed codepoint behaves like for joining purposes.
Assigned codepoints with a null in the Joining group column evoke no special behavior from the shaping engine during the join-computation stage.
The Mark class column indicates the Canonical Combining Class for the codepoint. Marks are assigned non-zero combining classes so that sequences of adjacent marks can be reordered as required by the orthography.
For Mongolian, a subset of marks in the 220 and 230 classes are also designated Modifier Combining Marks (MCM). These are denoted with 220_MCM and 230_MCM in the Mark class column. The MCM marks are treated differently during the mark-reordering stage.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+1800 |
Punctuation | NON_JOINING | null | 0 | ᠀ Mongolian Birga |
U+1801 |
Punctuation | NON_JOINING | null | 0 | ᠁ Mongolian Ellipsis |
U+1802 |
Punctuation | NON_JOINING | null | 0 | ᠂ Mongolian Comma |
U+1803 |
Punctuation | NON_JOINING | null | 0 | ᠃ Mongolian Full Stop |
U+1804 |
Punctuation | NON_JOINING | null | 0 | ᠄ Mongolian Colon |
U+1805 |
Punctuation | NON_JOINING | null | 0 | ᠅ Mongolian Four Dots |
U+1806 |
Punctuation [Pd] | NON_JOINING | null | 0 | ᠆ Todo Soft Hyphen |
U+1807 |
Punctuation | DUAL | null | 0 | ᠇ Sibe Syllable Boundary Mark |
U+1808 |
Punctuation | NON_JOINING | null | 0 | ᠈ Manchu Comma |
U+1809 |
Punctuation | NON_JOINING | null | 0 | ᠉ Manchu Full Stop |
U+180A |
Punctuation | JOIN_CAUSING | null | 0 | ᠊ Mongolian Nirugu |
U+180B |
Mark [Mn] | TRANSPARENT | null | 0 | ᠋ Free Variation Selector One |
U+180C |
Mark [Mn] | TRANSPARENT | null | 0 | ᠌ Free Variation Selector Two |
U+180D |
Mark [Mn] | TRANSPARENT | null | 0 | ᠍ Free Variation Selector Three |
U+180E |
Formatting | NON_JOINING | null | 0 | Mongolian Vowel Separator |
U+180F |
unassigned | ||||
U+1810 |
Number | NON_JOINING | null | 0 | ᠐ Digit Zero |
U+1811 |
Number | NON_JOINING | null | 0 | ᠑ Digit One |
U+1812 |
Number | NON_JOINING | null | 0 | ᠒ Digit Two |
U+1813 |
Number | NON_JOINING | null | 0 | ᠓ Digit Three |
U+1814 |
Number | NON_JOINING | null | 0 | ᠔ Digit Four |
U+1815 |
Number | NON_JOINING | null | 0 | ᠕ Digit Five |
U+1816 |
Number | NON_JOINING | null | 0 | ᠖ Digit Six |
U+1817 |
Number | NON_JOINING | null | 0 | ᠗ Digit Seven |
U+1818 |
Number | NON_JOINING | null | 0 | ᠘ Digit Eight |
U+1819 |
Number | NON_JOINING | null | 0 | ᠙ Digit Nine |
U+181A |
unassigned | ||||
U+181B |
unassigned | ||||
U+181C |
unassigned | ||||
U+181D |
unassigned | ||||
U+181E |
unassigned | ||||
U+181F |
unassigned | ||||
U+1820 |
Letter | DUAL | null | 0 | ᠠ A |
U+1821 |
Letter | DUAL | null | 0 | ᠡ E |
U+1822 |
Letter | DUAL | null | 0 | ᠢ I |
U+1823 |
Letter | DUAL | null | 0 | ᠣ O |
U+1824 |
Letter | DUAL | null | 0 | ᠤ U |
U+1825 |
Letter | DUAL | null | 0 | ᠥ Oe |
U+1827 |
Letter | DUAL | null | 0 | ᠦ Ue |
U+1827 |
Letter | DUAL | null | 0 | ᠧ Ee |
U+1828 |
Letter | DUAL | null | 0 | ᠨ Na |
U+1829 |
Letter | DUAL | null | 0 | ᠩ Ang |
U+182A |
Letter | DUAL | null | 0 | ᠪ Ba |
U+182B |
Letter | DUAL | null | 0 | ᠫ Pa |
U+182C |
Letter | DUAL | null | 0 | ᠬ Qa |
U+182D |
Letter | DUAL | null | 0 | ᠭ Ga |
U+182E |
Letter | DUAL | null | 0 | ᠮ Ma |
U+182F |
Letter | DUAL | null | 0 | ᠯ La |
U+1830 |
Letter | DUAL | null | 0 | ᠰ Sa |
U+1831 |
Letter | DUAL | null | 0 | ᠱ Sha |
U+1832 |
Letter | DUAL | null | 0 | ᠲ Ta |
U+1833 |
Letter | DUAL | null | 0 | ᠳ Da |
U+1834 |
Letter | DUAL | null | 0 | ᠴ Cha |
U+1835 |
Letter | DUAL | null | 0 | ᠵ Ja |
U+1836 |
Letter | DUAL | null | 0 | ᠶ Ya |
U+1837 |
Letter | DUAL | null | 0 | ᠷ Ra |
U+1838 |
Letter | DUAL | null | 0 | ᠸ Wa |
U+1839 |
Letter | DUAL | null | 0 | ᠹ Fa |
U+183A |
Letter | DUAL | null | 0 | ᠺ Ka |
U+183B |
Letter | DUAL | null | 0 | ᠻ Kha |
U+183C |
Letter | DUAL | null | 0 | ᠼ Tsa |
U+183D |
Letter | DUAL | null | 0 | ᠽ Za |
U+183E |
Letter | DUAL | null | 0 | ᠾ Haa |
U+183F |
Letter | DUAL | null | 0 | ᠿ Zra |
U+1840 |
Letter | DUAL | null | 0 | ᡀ Lha |
U+1841 |
Letter | DUAL | null | 0 | ᡁ Zhi |
U+1842 |
Letter | DUAL | null | 0 | ᡂ Chi |
U+1843 |
Letter | DUAL | null | 0 | ᡃ Todo Long Vowel Sign |
U+1844 |
Letter | DUAL | null | 0 | ᡄ Todo E |
U+1845 |
Letter | DUAL | null | 0 | ᡅ Todo I |
U+1846 |
Letter | DUAL | null | 0 | ᡆ Todo O |
U+1847 |
Letter | DUAL | null | 0 | ᡇ Todo U |
U+1848 |
Letter | DUAL | null | 0 | ᡈ Todo Oe |
U+1849 |
Letter | DUAL | null | 0 | ᡉ Todo Ue |
U+184A |
Letter | DUAL | null | 0 | ᡊ Todo Ang |
U+184B |
Letter | DUAL | null | 0 | ᡋ Todo Ba |
U+184C |
Letter | DUAL | null | 0 | ᡌ Todo Pa |
U+184D |
Letter | DUAL | null | 0 | ᡍ Todo Qa |
U+184E |
Letter | DUAL | null | 0 | ᡎ Todo Ga |
U+184F |
Letter | DUAL | null | 0 | ᡏ Todo Ma |
U+1850 |
Letter | DUAL | null | 0 | ᡐ Todo Ta |
U+1851 |
Letter | DUAL | null | 0 | ᡑ Todo Da |
U+1852 |
Letter | DUAL | null | 0 | ᡒ Todo Cha |
U+1853 |
Letter | DUAL | null | 0 | ᡓ Todo Ja |
U+1854 |
Letter | DUAL | null | 0 | ᡔ Todo Tsa |
U+1855 |
Letter | DUAL | null | 0 | ᡕ Todo Ya |
U+1856 |
Letter | DUAL | null | 0 | ᡖ Todo Wa |
U+1857 |
Letter | DUAL | null | 0 | ᡗ Todo Ka |
U+1858 |
Letter | DUAL | null | 0 | ᡘ Todo Gaa |
U+1859 |
Letter | DUAL | null | 0 | ᡙ Todo Haa |
U+185A |
Letter | DUAL | null | 0 | ᡚ Todo Jia |
U+185B |
Letter | DUAL | null | 0 | ᡛ Todo Nia |
U+185C |
Letter | DUAL | null | 0 | ᡜ Todo Dza |
U+185D |
Letter | DUAL | null | 0 | ᡝ Sibe E |
U+185E |
Letter | DUAL | null | 0 | ᡞ Sibe I |
U+185F |
Letter | DUAL | null | 0 | ᡟ Sibe Iy |
U+1860 |
Letter | DUAL | null | 0 | ᡠ Sibe Ue |
U+1861 |
Letter | DUAL | null | 0 | ᡡ Sibe U |
U+1862 |
Letter | DUAL | null | 0 | ᡢ Sibe Ang |
U+1863 |
Letter | DUAL | null | 0 | ᡣ Sibe Ka |
U+1864 |
Letter | DUAL | null | 0 | ᡤ Sibe Ga |
U+1865 |
Letter | DUAL | null | 0 | ᡥ Sibe Ha |
U+1866 |
Letter | DUAL | null | 0 | ᡦ Sibe Pa |
U+1867 |
Letter | DUAL | null | 0 | ᡧ Sibe Sha |
U+1868 |
Letter | DUAL | null | 0 | ᡨ Sibe Ta |
U+1869 |
Letter | DUAL | null | 0 | ᡩ Sibe Da |
U+186A |
Letter | DUAL | null | 0 | ᡪ Sibe Ja |
U+186B |
Letter | DUAL | null | 0 | ᡫ Sibe Fa |
U+186C |
Letter | DUAL | null | 0 | ᡬ Sibe Gaa |
U+186D |
Letter | DUAL | null | 0 | ᡭ Sibe Haa |
U+186E |
Letter | DUAL | null | 0 | ᡮ Sibe Tsa |
U+186F |
Letter | DUAL | null | 0 | ᡯ Sibe Za |
U+1870 |
Letter | DUAL | null | 0 | ᡰ Sibe Raa |
U+1871 |
Letter | DUAL | null | 0 | ᡱ Sibe Cha |
U+1872 |
Letter | DUAL | null | 0 | ᡲ Sibe Zha |
U+1873 |
Letter | DUAL | null | 0 | ᡳ Manchu I |
U+1874 |
Letter | DUAL | null | 0 | ᡴ Manchu Ka |
U+1875 |
Letter | DUAL | null | 0 | ᡵ Manchu Ra |
U+1876 |
Letter | DUAL | null | 0 | ᡶ Manchu Fa |
U+1877 |
Letter | DUAL | null | 0 | ᡷ Manchu Zha |
U+1878 |
Letter | DUAL | null | 0 | ᡸ Cha With Two Dots |
U+1879 |
unassigned | ||||
U+187A |
unassigned | ||||
U+187B |
unassigned | ||||
U+187C |
unassigned | ||||
U+187D |
unassigned | ||||
U+187E |
unassigned | ||||
U+187F |
unassigned | ||||
U+1880 |
Letter | NON_JOINING | null | 0 | ᢀ Ali Gali Anusvara One |
U+1881 |
Letter | NON_JOINING | null | 0 | ᢁ Ali Gali Visarga One |
U+1882 |
Letter | NON_JOINING | null | 0 | ᢂ Ali Gali Damaru |
U+1883 |
Letter | NON_JOINING | null | 0 | ᢃ Ali Gali Ubadama |
U+1884 |
Letter | NON_JOINING | null | 0 | ᢄ Ali Gali Inverted Ubadama |
U+1885 |
Mark [Mn] | TRANSPARENT | null | 0 | ᢅ Ali Gali Baluda |
U+1886 |
Mark [Mn] | TRANSPARENT | null | 0 | ᢆ Ali Gali Three Baluda |
U+1887 |
Letter | DUAL | null | 0 | ᢇ Ali Gali A |
U+1888 |
Letter | DUAL | null | 0 | ᢈ Ali Gali I |
U+1889 |
Letter | DUAL | null | 0 | ᢉ Ali Gali Ka |
U+188A |
Letter | DUAL | null | 0 | ᢊ Ali Gali Nga |
U+188B |
Letter | DUAL | null | 0 | ᢋ Ali Gali Ca |
U+188C |
Letter | DUAL | null | 0 | ᢌ Ali Gali Tta |
U+188D |
Letter | DUAL | null | 0 | ᢍ Ali Gali Ttha |
U+188E |
Letter | DUAL | null | 0 | ᢎ Ali Gali Dda |
U+188F |
Letter | DUAL | null | 0 | ᢏ Ali Gali Nna |
U+1890 |
Letter | DUAL | null | 0 | ᢐ Ali Gali Ta |
U+1891 |
Letter | DUAL | null | 0 | ᢑ Ali Gali Da |
U+1892 |
Letter | DUAL | null | 0 | ᢒ Ali Gali Pa |
U+1893 |
Letter | DUAL | null | 0 | ᢓ Ali Gali Pha |
U+1894 |
Letter | DUAL | null | 0 | ᢔ Ali Gali Ssa |
U+1895 |
Letter | DUAL | null | 0 | ᢕ Ali Gali Zha |
U+1896 |
Letter | DUAL | null | 0 | ᢖ Ali Gali Za |
U+1897 |
Letter | DUAL | null | 0 | ᢗ Ali Gali Ah |
U+1898 |
Letter | DUAL | null | 0 | ᢘ Todo Ali Gali Ta |
U+1899 |
Letter | DUAL | null | 0 | ᢙ Todo Ali Gali Zha |
U+189A |
Letter | DUAL | null | 0 | ᢚ Manchu Ali Gali Gha |
U+189B |
Letter | DUAL | null | 0 | ᢛ Manchu Ali Gali Nga |
U+189C |
Letter | DUAL | null | 0 | ᢜ Manchu Ali Gali Ca |
U+189D |
Letter | DUAL | null | 0 | ᢝ Manchu Ali Gali Jha |
U+189E |
Letter | DUAL | null | 0 | ᢞ Manchu Ali Gali Tta |
U+189F |
Letter | DUAL | null | 0 | ᢟ Manchu Ali Gali Ddha |
U+18A0 |
Letter | DUAL | null | 0 | ᢠ Manchu Ali Gali Ta |
U+18A1 |
Letter | DUAL | null | 0 | ᢡ Manchu Ali Gali Dha |
U+18A2 |
Letter | DUAL | null | 0 | ᢢ Manchu Ali Gali Ssa |
U+18A3 |
Letter | DUAL | null | 0 | ᢣ Manchu Ali Gali Cya |
U+18A4 |
Letter | DUAL | null | 0 | ᢤ Manchu Ali Gali Zha |
U+18A5 |
Letter | DUAL | null | 0 | ᢥ Manchu Ali Gali Za |
U+18A6 |
Letter | DUAL | null | 0 | ᢦ Ali Gali Half U |
U+18A7 |
Letter | DUAL | null | 0 | ᢧ Ali Gali Half Ya |
U+18A8 |
Letter | DUAL | null | 0 | ᢨ Manchu Ali Gali Bha |
U+18A9 |
Mark [Mn] | TRANSPARENT | null | 228 | ᢩ Ali Gali Dagalga |
U+18AA |
Letter | DUAL | null | 0 | ᢪ Manchu Ali Gali Lha |
U+18AB |
unassigned | ||||
U+18AC |
unassigned | ||||
U+18AD |
unassigned | ||||
U+18AE |
unassigned | ||||
U+18AF |
unassigned |
The Mongolian Supplement block includes variants of the birga mark used to denote the beginning of a text.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+11660 |
Punctuation | NON_JOINING | null | 0 | 𑙠 Birga with Ornament |
U+11661 |
Punctuation | NON_JOINING | null | 0 | 𑙡 Rotated Birga |
U+11662 |
Punctuation | NON_JOINING | null | 0 | 𑙢 Double Birga with Ornament |
U+11663 |
Punctuation | NON_JOINING | null | 0 | 𑙣 Triple Birga with Ornament |
U+11664 |
Punctuation | NON_JOINING | null | 0 | 𑙤 Birga with Double Ornament |
U+11665 |
Punctuation | NON_JOINING | null | 0 | 𑙥 Rotated Birga with Ornament |
U+11666 |
Punctuation | NON_JOINING | null | 0 | 𑙦 Rotated Birga with Double Ornament |
U+11667 |
Punctuation | NON_JOINING | null | 0 | 𑙧 Inverted Birga |
U+11668 |
Punctuation | NON_JOINING | null | 0 | 𑙨 Inverted Birga with Double Ornament |
U+11669 |
Punctuation | NON_JOINING | null | 0 | 𑙩 Swirl Birga |
U+1166A |
Punctuation | NON_JOINING | null | 0 | 𑙪 Swirl Birga with Ornament |
U+1166B |
Punctuation | NON_JOINING | null | 0 | 𑙫 Swirl Birga with Double Ornament |
U+1166C |
Punctuation | NON_JOINING | null | 0 | 𑙬 Turned Swirl Birga with Double Ornament |
U+1166D |
unassigned | ||||
U+1166E |
unassigned | ||||
U+1166F |
unassigned | ||||
U+11670 |
unassigned | ||||
U+11671 |
unassigned | ||||
U+11672 |
unassigned | ||||
U+11673 |
unassigned | ||||
U+11674 |
unassigned | ||||
U+11675 |
unassigned | ||||
U+11676 |
unassigned | ||||
U+11677 |
unassigned | ||||
U+11678 |
unassigned | ||||
U+11679 |
unassigned | ||||
U+1167A |
unassigned | ||||
U+1167B |
unassigned | ||||
U+1167C |
unassigned | ||||
U+1167D |
unassigned | ||||
U+1167E |
unassigned | ||||
U+1167F |
unassigned |
Other important characters that may be encountered when shaping runs
of Mongolian text include the dotted-circle placeholder (U+25CC
), the
combining grapheme joiner (U+034F
), the zero-width joiner (U+200D
)
and zero-width non-joiner (U+200C
), the left-to-right text marker
(U+200E
) and right-to-left text marker (U+200F
), and the no-break
space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+00A0 |
Separator | NON_JOINING | null | 0 | No-break space |
U+200C |
Other | NON_JOINING | null | 0 | Zero-width non-joiner |
U+200D |
Other | JOIN_CAUSING | null | 0 | Zero-width joiner |
U+2010 |
Punctuation | NON_JOINING | null | 0 | ‐ Hyphen |
U+2011 |
Punctuation | NON_JOINING | null | 0 | ‑ No-break hyphen |
U+2012 |
Punctuation | NON_JOINING | null | 0 | ‒ Figure dash |
U+2013 |
Punctuation | NON_JOINING | null | 0 | – En dash |
U+2014 |
Punctuation | NON_JOINING | null | 0 | — Em dash |
U+202F |
Separator | NON_JOINING | null | 0 | Narrow No-Break Space |
U+25CC |
Symbol | NON_JOINING | null | 0 | ◌ Dotted circle |
The zero-width joiner (ZWJ) is primarily used to force the usage of the cursive connecting form of a letter even when the context of the adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such as for dislaying it in a table of forms), the sequence "Letter,ZWJ" would be used. To show the medial form of a letter in isolation, the sequence "ZWJ,Letter,ZWJ" would be used.
The no-break space is primarily used to display those codepoints that are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder.
The narrow no-break space is used in Mongolian to insert a small gap between a word and its suffix.