Skip to content

Latest commit

 

History

History
330 lines (296 loc) · 33.2 KB

character-tables-mongolian.md

File metadata and controls

330 lines (296 loc) · 33.2 KB

Mongolian character tables

This document lists the per-character shaping information needed to shape Mongolian text.

Table of Contents

Mongolian character table

Mongolian glyphs should be classified as in the following table. Codepoints in the Mongolian block with no assigned meaning are designated as unassigned in the Unicode category column.

The Joining type column indicates whether each codepoint is defined as joining with adjacent characters on the left side, right side, left and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints designated TRANSPARENT in the Joining type column do not join with adjacent characters and, in addition, do not affect the joining behavior of surrounding characters. Non-spacing marks are of type TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent characters to join.

The Joining group column lists the fundamental letter that the listed codepoint behaves like for joining purposes.

Assigned codepoints with a null in the Joining group column evoke no special behavior from the shaping engine during the join-computation stage.

The Mark class column indicates the Canonical Combining Class for the codepoint. Marks are assigned non-zero combining classes so that sequences of adjacent marks can be reordered as required by the orthography.

For Mongolian, a subset of marks in the 220 and 230 classes are also designated Modifier Combining Marks (MCM). These are denoted with 220_MCM and 230_MCM in the Mark class column. The MCM marks are treated differently during the mark-reordering stage.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+1800 Punctuation NON_JOINING null 0 ᠀ Mongolian Birga
U+1801 Punctuation NON_JOINING null 0 ᠁ Mongolian Ellipsis
U+1802 Punctuation NON_JOINING null 0 ᠂ Mongolian Comma
U+1803 Punctuation NON_JOINING null 0 ᠃ Mongolian Full Stop
U+1804 Punctuation NON_JOINING null 0 ᠄ Mongolian Colon
U+1805 Punctuation NON_JOINING null 0 ᠅ Mongolian Four Dots
U+1806 Punctuation [Pd] NON_JOINING null 0 ᠆ Todo Soft Hyphen
U+1807 Punctuation DUAL null 0 ᠇ Sibe Syllable Boundary Mark
U+1808 Punctuation NON_JOINING null 0 ᠈ Manchu Comma
U+1809 Punctuation NON_JOINING null 0 ᠉ Manchu Full Stop
U+180A Punctuation JOIN_CAUSING null 0 ᠊ Mongolian Nirugu
U+180B Mark [Mn] TRANSPARENT null 0 ᠋ Free Variation Selector One
U+180C Mark [Mn] TRANSPARENT null 0 ᠌ Free Variation Selector Two
U+180D Mark [Mn] TRANSPARENT null 0 ᠍ Free Variation Selector Three
U+180E Formatting NON_JOINING null 0 ᠎ Mongolian Vowel Separator
U+180F unassigned
U+1810 Number NON_JOINING null 0 ᠐ Digit Zero
U+1811 Number NON_JOINING null 0 ᠑ Digit One
U+1812 Number NON_JOINING null 0 ᠒ Digit Two
U+1813 Number NON_JOINING null 0 ᠓ Digit Three
U+1814 Number NON_JOINING null 0 ᠔ Digit Four
U+1815 Number NON_JOINING null 0 ᠕ Digit Five
U+1816 Number NON_JOINING null 0 ᠖ Digit Six
U+1817 Number NON_JOINING null 0 ᠗ Digit Seven
U+1818 Number NON_JOINING null 0 ᠘ Digit Eight
U+1819 Number NON_JOINING null 0 ᠙ Digit Nine
U+181A unassigned
U+181B unassigned
U+181C unassigned
U+181D unassigned
U+181E unassigned
U+181F unassigned
U+1820 Letter DUAL null 0 ᠠ A
U+1821 Letter DUAL null 0 ᠡ E
U+1822 Letter DUAL null 0 ᠢ I
U+1823 Letter DUAL null 0 ᠣ O
U+1824 Letter DUAL null 0 ᠤ U
U+1825 Letter DUAL null 0 ᠥ Oe
U+1827 Letter DUAL null 0 ᠦ Ue
U+1827 Letter DUAL null 0 ᠧ Ee
U+1828 Letter DUAL null 0 ᠨ Na
U+1829 Letter DUAL null 0 ᠩ Ang
U+182A Letter DUAL null 0 ᠪ Ba
U+182B Letter DUAL null 0 ᠫ Pa
U+182C Letter DUAL null 0 ᠬ Qa
U+182D Letter DUAL null 0 ᠭ Ga
U+182E Letter DUAL null 0 ᠮ Ma
U+182F Letter DUAL null 0 ᠯ La
U+1830 Letter DUAL null 0 ᠰ Sa
U+1831 Letter DUAL null 0 ᠱ Sha
U+1832 Letter DUAL null 0 ᠲ Ta
U+1833 Letter DUAL null 0 ᠳ Da
U+1834 Letter DUAL null 0 ᠴ Cha
U+1835 Letter DUAL null 0 ᠵ Ja
U+1836 Letter DUAL null 0 ᠶ Ya
U+1837 Letter DUAL null 0 ᠷ Ra
U+1838 Letter DUAL null 0 ᠸ Wa
U+1839 Letter DUAL null 0 ᠹ Fa
U+183A Letter DUAL null 0 ᠺ Ka
U+183B Letter DUAL null 0 ᠻ Kha
U+183C Letter DUAL null 0 ᠼ Tsa
U+183D Letter DUAL null 0 ᠽ Za
U+183E Letter DUAL null 0 ᠾ Haa
U+183F Letter DUAL null 0 ᠿ Zra
U+1840 Letter DUAL null 0 ᡀ Lha
U+1841 Letter DUAL null 0 ᡁ Zhi
U+1842 Letter DUAL null 0 ᡂ Chi
U+1843 Letter DUAL null 0 ᡃ Todo Long Vowel Sign
U+1844 Letter DUAL null 0 ᡄ Todo E
U+1845 Letter DUAL null 0 ᡅ Todo I
U+1846 Letter DUAL null 0 ᡆ Todo O
U+1847 Letter DUAL null 0 ᡇ Todo U
U+1848 Letter DUAL null 0 ᡈ Todo Oe
U+1849 Letter DUAL null 0 ᡉ Todo Ue
U+184A Letter DUAL null 0 ᡊ Todo Ang
U+184B Letter DUAL null 0 ᡋ Todo Ba
U+184C Letter DUAL null 0 ᡌ Todo Pa
U+184D Letter DUAL null 0 ᡍ Todo Qa
U+184E Letter DUAL null 0 ᡎ Todo Ga
U+184F Letter DUAL null 0 ᡏ Todo Ma
U+1850 Letter DUAL null 0 ᡐ Todo Ta
U+1851 Letter DUAL null 0 ᡑ Todo Da
U+1852 Letter DUAL null 0 ᡒ Todo Cha
U+1853 Letter DUAL null 0 ᡓ Todo Ja
U+1854 Letter DUAL null 0 ᡔ Todo Tsa
U+1855 Letter DUAL null 0 ᡕ Todo Ya
U+1856 Letter DUAL null 0 ᡖ Todo Wa
U+1857 Letter DUAL null 0 ᡗ Todo Ka
U+1858 Letter DUAL null 0 ᡘ Todo Gaa
U+1859 Letter DUAL null 0 ᡙ Todo Haa
U+185A Letter DUAL null 0 ᡚ Todo Jia
U+185B Letter DUAL null 0 ᡛ Todo Nia
U+185C Letter DUAL null 0 ᡜ Todo Dza
U+185D Letter DUAL null 0 ᡝ Sibe E
U+185E Letter DUAL null 0 ᡞ Sibe I
U+185F Letter DUAL null 0 ᡟ Sibe Iy
U+1860 Letter DUAL null 0 ᡠ Sibe Ue
U+1861 Letter DUAL null 0 ᡡ Sibe U
U+1862 Letter DUAL null 0 ᡢ Sibe Ang
U+1863 Letter DUAL null 0 ᡣ Sibe Ka
U+1864 Letter DUAL null 0 ᡤ Sibe Ga
U+1865 Letter DUAL null 0 ᡥ Sibe Ha
U+1866 Letter DUAL null 0 ᡦ Sibe Pa
U+1867 Letter DUAL null 0 ᡧ Sibe Sha
U+1868 Letter DUAL null 0 ᡨ Sibe Ta
U+1869 Letter DUAL null 0 ᡩ Sibe Da
U+186A Letter DUAL null 0 ᡪ Sibe Ja
U+186B Letter DUAL null 0 ᡫ Sibe Fa
U+186C Letter DUAL null 0 ᡬ Sibe Gaa
U+186D Letter DUAL null 0 ᡭ Sibe Haa
U+186E Letter DUAL null 0 ᡮ Sibe Tsa
U+186F Letter DUAL null 0 ᡯ Sibe Za
U+1870 Letter DUAL null 0 ᡰ Sibe Raa
U+1871 Letter DUAL null 0 ᡱ Sibe Cha
U+1872 Letter DUAL null 0 ᡲ Sibe Zha
U+1873 Letter DUAL null 0 ᡳ Manchu I
U+1874 Letter DUAL null 0 ᡴ Manchu Ka
U+1875 Letter DUAL null 0 ᡵ Manchu Ra
U+1876 Letter DUAL null 0 ᡶ Manchu Fa
U+1877 Letter DUAL null 0 ᡷ Manchu Zha
U+1878 Letter DUAL null 0 ᡸ Cha With Two Dots
U+1879 unassigned
U+187A unassigned
U+187B unassigned
U+187C unassigned
U+187D unassigned
U+187E unassigned
U+187F unassigned
U+1880 Letter NON_JOINING null 0 ᢀ Ali Gali Anusvara One
U+1881 Letter NON_JOINING null 0 ᢁ Ali Gali Visarga One
U+1882 Letter NON_JOINING null 0 ᢂ Ali Gali Damaru
U+1883 Letter NON_JOINING null 0 ᢃ Ali Gali Ubadama
U+1884 Letter NON_JOINING null 0 ᢄ Ali Gali Inverted Ubadama
U+1885 Mark [Mn] TRANSPARENT null 0 ᢅ Ali Gali Baluda
U+1886 Mark [Mn] TRANSPARENT null 0 ᢆ Ali Gali Three Baluda
U+1887 Letter DUAL null 0 ᢇ Ali Gali A
U+1888 Letter DUAL null 0 ᢈ Ali Gali I
U+1889 Letter DUAL null 0 ᢉ Ali Gali Ka
U+188A Letter DUAL null 0 ᢊ Ali Gali Nga
U+188B Letter DUAL null 0 ᢋ Ali Gali Ca
U+188C Letter DUAL null 0 ᢌ Ali Gali Tta
U+188D Letter DUAL null 0 ᢍ Ali Gali Ttha
U+188E Letter DUAL null 0 ᢎ Ali Gali Dda
U+188F Letter DUAL null 0 ᢏ Ali Gali Nna
U+1890 Letter DUAL null 0 ᢐ Ali Gali Ta
U+1891 Letter DUAL null 0 ᢑ Ali Gali Da
U+1892 Letter DUAL null 0 ᢒ Ali Gali Pa
U+1893 Letter DUAL null 0 ᢓ Ali Gali Pha
U+1894 Letter DUAL null 0 ᢔ Ali Gali Ssa
U+1895 Letter DUAL null 0 ᢕ Ali Gali Zha
U+1896 Letter DUAL null 0 ᢖ Ali Gali Za
U+1897 Letter DUAL null 0 ᢗ Ali Gali Ah
U+1898 Letter DUAL null 0 ᢘ Todo Ali Gali Ta
U+1899 Letter DUAL null 0 ᢙ Todo Ali Gali Zha
U+189A Letter DUAL null 0 ᢚ Manchu Ali Gali Gha
U+189B Letter DUAL null 0 ᢛ Manchu Ali Gali Nga
U+189C Letter DUAL null 0 ᢜ Manchu Ali Gali Ca
U+189D Letter DUAL null 0 ᢝ Manchu Ali Gali Jha
U+189E Letter DUAL null 0 ᢞ Manchu Ali Gali Tta
U+189F Letter DUAL null 0 ᢟ Manchu Ali Gali Ddha
U+18A0 Letter DUAL null 0 ᢠ Manchu Ali Gali Ta
U+18A1 Letter DUAL null 0 ᢡ Manchu Ali Gali Dha
U+18A2 Letter DUAL null 0 ᢢ Manchu Ali Gali Ssa
U+18A3 Letter DUAL null 0 ᢣ Manchu Ali Gali Cya
U+18A4 Letter DUAL null 0 ᢤ Manchu Ali Gali Zha
U+18A5 Letter DUAL null 0 ᢥ Manchu Ali Gali Za
U+18A6 Letter DUAL null 0 ᢦ Ali Gali Half U
U+18A7 Letter DUAL null 0 ᢧ Ali Gali Half Ya
U+18A8 Letter DUAL null 0 ᢨ Manchu Ali Gali Bha
U+18A9 Mark [Mn] TRANSPARENT null 228 ᢩ Ali Gali Dagalga
U+18AA Letter DUAL null 0 ᢪ Manchu Ali Gali Lha
U+18AB unassigned
U+18AC unassigned
U+18AD unassigned
U+18AE unassigned
U+18AF unassigned

Mongolian Supplement character table

The Mongolian Supplement block includes variants of the birga mark used to denote the beginning of a text.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+11660 Punctuation NON_JOINING null 0 𑙠 Birga with Ornament
U+11661 Punctuation NON_JOINING null 0 𑙡 Rotated Birga
U+11662 Punctuation NON_JOINING null 0 𑙢 Double Birga with Ornament
U+11663 Punctuation NON_JOINING null 0 𑙣 Triple Birga with Ornament
U+11664 Punctuation NON_JOINING null 0 𑙤 Birga with Double Ornament
U+11665 Punctuation NON_JOINING null 0 𑙥 Rotated Birga with Ornament
U+11666 Punctuation NON_JOINING null 0 𑙦 Rotated Birga with Double Ornament
U+11667 Punctuation NON_JOINING null 0 𑙧 Inverted Birga
U+11668 Punctuation NON_JOINING null 0 𑙨 Inverted Birga with Double Ornament
U+11669 Punctuation NON_JOINING null 0 𑙩 Swirl Birga
U+1166A Punctuation NON_JOINING null 0 𑙪 Swirl Birga with Ornament
U+1166B Punctuation NON_JOINING null 0 𑙫 Swirl Birga with Double Ornament
U+1166C Punctuation NON_JOINING null 0 𑙬 Turned Swirl Birga with Double Ornament
U+1166D unassigned
U+1166E unassigned
U+1166F unassigned
U+11670 unassigned
U+11671 unassigned
U+11672 unassigned
U+11673 unassigned
U+11674 unassigned
U+11675 unassigned
U+11676 unassigned
U+11677 unassigned
U+11678 unassigned
U+11679 unassigned
U+1167A unassigned
U+1167B unassigned
U+1167C unassigned
U+1167D unassigned
U+1167E unassigned
U+1167F unassigned

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Mongolian text include the dotted-circle placeholder (U+25CC), the combining grapheme joiner (U+034F), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), the left-to-right text marker (U+200E) and right-to-left text marker (U+200F), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+00A0 Separator NON_JOINING null 0   No-break space
U+200C Other NON_JOINING null 0 ‌ Zero-width non-joiner
U+200D Other JOIN_CAUSING null 0 ‍ Zero-width joiner
U+2010 Punctuation NON_JOINING null 0 ‐ Hyphen
U+2011 Punctuation NON_JOINING null 0 ‑ No-break hyphen
U+2012 Punctuation NON_JOINING null 0 ‒ Figure dash
U+2013 Punctuation NON_JOINING null 0 – En dash
U+2014 Punctuation NON_JOINING null 0 — Em dash
U+202F Separator NON_JOINING null 0   Narrow No-Break Space
U+25CC Symbol NON_JOINING null 0 ◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to force the usage of the cursive connecting form of a letter even when the context of the adjoining letters would not trigger the connecting form.

For example, to show the initial form of a letter in isolation (such as for dislaying it in a table of forms), the sequence "Letter,ZWJ" would be used. To show the medial form of a letter in isolation, the sequence "ZWJ,Letter,ZWJ" would be used.

The no-break space is primarily used to display those codepoints that are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder.

The narrow no-break space is used in Mongolian to insert a small gap between a word and its suffix.