Plane (Unicode)

The Unicode characters can be categorized in many different ways, Unicode code points can be logically divided into 17 planes, each with 65,536 (= 2¹⁶) code points, although currently only a few planes are used:

Plane 0 (0000–FFFF): Basic Multilingual Plane (BMP). This is the plane containing most of the character assignments so far. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing systems in current use.
Plane 1 (10000–1FFFF): Supplementary Multilingual Plane (SMP).
Plane 2 (20000–2FFFF): Supplementary Ideographic Plane (SIP)
Planes 3 to 13 (30000–DFFFF) are unassigned
Plane 14 (E0000–EFFFF): Supplementary Special-purpose Plane (SSP)
Plane 15 (F0000–FFFFF) reserved for the Private Use Area (PUA)
Plane 16 (100000–10FFFF), reserved for the Private Use Area (PUA)

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively blocked out for every current and ancient writing system (script) the Unicode consortium has been able to identify: (see [1]). While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain, if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is therefore unlikely to be reached in the near future.

Basic Multilingual Plane

The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.

The graphic on the right is a visual roadmap to the Basic Multilingual Plane. The colours in use are:

Black = Latin scripts and symbols
Light Blue = Linguistic scripts
Blue = Other European scripts
Orange = Middle Eastern and SW Asian scripts
Light Orange = African scripts
Green = South Asian scripts
Purple = Southeast Asian scripts
Red = East Asian scripts
Light Red = Unified CJK Han
Yellow = Canadian Aboriginal scripts
Magenta = Symbols
Dark Grey = Diacritics
Light Grey = UTF-16 surrogates and private use
Cyan = Miscellaneous characters
White = Unused

As of Unicode 5.0, The BMP includes the following scripts:

Basic Latin (0000–007F)
Latin-1 Supplement (0080–00FF)
Latin Extended-A (0100–017F)
Latin Extended-B (0180–024F)
IPA Extensions (0250–02AF)
Spacing Modifier Letters (02B0–02FF)
Combining Diacritical Marks (0300–036F)
Greek and Coptic (0370–03FF)
Cyrillic (0400–04FF)
Cyrillic Supplement (0500–052F)
Armenian (0530–058F)
Hebrew (0590–05FF)
Arabic (0600–06FF)
Syriac (0700–074F)
Arabic Supplement (0750–077F)
Thaana (0780–07BF)
N'Ko (Mandenkan) (07C0–07FF)
Indic scripts:
- Devanagari (0900–097F)
- Bengali (0980–09FF)
- Gurmukhi (0A00–0A7F)
- Gujarati (0A80–0AFF)
- Oriya (0B00–0B7F)
- Tamil (0B80–0BFF)
- Telugu (0C00–0C7F)
- Kannada (0C80–0CFF)
- Malayalam (0D00–0D7F)
- Sinhala (0D80–0DFF)
Thai (0E00–0E7F)
Lao (0E80–0EFF)
Tibetan (0F00–0FFF)
Burmese (1000–109F)
Georgian (10A0–10FF)
Hangul Jamo (1100–11FF)
Ethiopic (1200–137F)
Ethiopic Supplement (1380–139F)
Cherokee (13A0–13FF)
Unified Canadian Aboriginal Syllabics (1400–167F)
Ogham (1680–169F)
Runic (16A0–16FF)
Philippine scripts:
- Tagalog (1700–171F)
- Hanunóo (1720–173F)
- Buhid (1740–175F)
- Tagbanwa (1760–177F)

Khmer (1780–17FF)
Mongolian (1800–18AF)
Limbu (1900–194F)
Tai Le (1950–197F)
New Tai Lue (1980–19DF)
Khmer Symbols (19E0–19FF)
Buginese (1A00–1A1F)
Balinese (1B00–1B7F)
Lepcha (Rong) (1C00–1C4F)
Phonetic Extensions (1D00–1D7F)
Phonetic Extensions Supplement (1D80–1DBF)
Combining Diacritical Marks Supplement (1DC0–1DFF)
Latin Extended Additional (1E00–1EFF)
Greek Extended (1F00–1FFF)
Symbols:
- General Punctuation (2000–206F)
- Superscripts and Subscripts (2070–209F)
- Currency Symbols (20A0–20CF)
- Combining Diacritical Marks for Symbols (20D0–20FF)
- Letterlike Symbols (2100–214F)
- Number Forms (2150–218F)
- Arrows (2190–21FF)
- Mathematical Operators (2200–22FF)
- Miscellaneous Technical (2300–23FF)
- Control Pictures (2400–243F)
- Optical Character Recognition (2440–245F)
- Enclosed Alphanumerics (2460–24FF)
- Box Drawing (2500–257F)
- Block Elements (2580–259F)
- Geometric Shapes (25A0–25FF)
- Miscellaneous Symbols (2600–26FF)
- Dingbats (2700–27BF)
- Miscellaneous Mathematical Symbols-A (27C0–27EF)
- Supplemental Arrows-A (27F0–27FF)
- Braille Patterns (2800–28FF)
- Supplemental Arrows-B (2900–297F)
- Miscellaneous Mathematical Symbols-B (2980–29FF)
- Supplemental Mathematical Operators (2A00–2AFF)
- Miscellaneous Symbols and Arrows (2B00–2BFF)
Glagolitic (2C00–2C5F)
Latin Extended-C (2C60–2C7F)
Coptic (2C80–2CFF)
Georgian Supplement (2D00–2D2F)
Tifinagh (2D30–2D7F)
Ethiopic Extended (2D80–2DDF)

Supplemental Punctuation (2E00–2E7F)
CJK Radicals Supplement (2E80–2EFF)
Kangxi Radicals (2F00–2FDF)
Ideographic Description Characters (2FF0–2FFF)
CJK Symbols and Punctuation (3000–303F)
Hiragana (3040–309F)
Katakana (30A0–30FF)
Bopomofo (3100–312F)
Hangul Compatibility Jamo (3130–318F)
Kanbun (3190–319F)
Bopomofo Extended (31A0–31BF)
CJK Strokes (31C0–31EF)
Katakana Phonetic Extensions (31F0–31FF)
Enclosed CJK Letters and Months (3200–32FF)
CJK Compatibility (3300–33FF)
CJK Unified Ideographs Extension A (3400–4DBF)
Yijing Hexagram Symbols (4DC0–4DFF)
CJK Unified Ideographs (4E00–9FFF)
Yi Syllables (A000–A48F)
Yi Radicals (A490–A4CF)
Modifier Tone Letters (A700–A71F)
Latin Extended-D (A720–A7FF)
Syloti Nagri (A800–A82F)
Phags-pa (A840–A87F)
Hangul Syllables (AC00–D7AF)
High Surrogates (D800–DB7F)
High Private Use Surrogates (DB80–DBFF)
Low Surrogates (DC00–DFFF)
Private Use Area (E000–F8FF)
CJK Compatibility Ideographs (F900–FAFF)
Alphabetic Presentation Forms (FB00–FB4F)
Arabic Presentation Forms-A (FB50–FDFF)
Variation Selectors (FE00–FE0F)
Vertical Forms (FE10–FE1F)
Combining Half Marks (FE20–FE2F)
CJK Compatibility Forms (FE30–FE4F)
Small Form Variants (FE50–FE6F)
Arabic Presentation Forms-B (FE70–FEFF)
Halfwidth and Fullwidth Forms (FF00–FFEF)
Specials (FFF0–FFFF)

Future additions Several scripts are expected to be included in the BMP in the next revision of Unicode. These scripts, and their proposed code point ranges, are the following:

Cham (18B0–18FF)
Lanna (Old Tai Lue) (1A80–1AEF)
Santali (Ol Cemet' / Ol Chiki) (2DE0–2DFF)
Vai (A500–A61F)
Saurashtra (AB00–AB5F)

Several other scripts are proposed for inclusion in the BMP, including:

Avestan (0800–083F)
Pahlavi (0840–087F)
Batak (1A20–1A5F)
Meitei Mayek / Meitei (1C80–1CDF)
Varang Kshiti (AA00–AA3F)
Sorang Sompeng (AA40–AA6F)

Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.

As of Unicode 5.0, Plane One includes the following scripts:

Linear B Syllabary (10000–1007F)
Linear B Ideograms (10080–100FF)
Aegean Numbers (10100–1013F)
Ancient Greek Numbers (10140–1018F)
Old Italic (10300–1032F)
Gothic (10330–1034F)
Ugaritic (10380–1039F)
Old Persian (103A0–103DF)
Deseret (10400–1044F)
Shavian (10450–1047F)
Osmanya (10480–104AF)
Cypriot Syllabary (10800–1083F)
Phoenician (10900–1091F)
Kharoshthi (10A00–10A5F)
Sumero-Akkadian Cuneiform (12000–1236E and 12400–12473)
Byzantine Musical Symbols (1D000–1D0FF)
Musical Symbols (1D100–1D1FF)
Ancient Greek Musical Notation (1D200–1D24F)
Tai Xuan Jing Symbols (1D300–1D35F)
Mathematical Alphanumeric Symbols (1D400–1D7FF)

Many other scripts are proposed for inclusion in Plane One, including:

Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for about 40,000 Unified Han Ideographs that have previously been seldom used in daily written communications.

Unused planes

Unicode has not yet assigned any characters to Planes 3 through 13. It is not anticipated that these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless. The UCS and Unicode take requests for symbols on a case by case basis.

Supplementary Special-purpose Plane

Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters in two blocks of 128 and 240 characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the xml:lang attribute in XML). The other block contains glyph variation selectors to indicate an alternate glyph for a character that cannot be determined by context.

Private use planes

Two planes (planes 15 and 16) have been set aside for character assignment by parties outside the ISO and the Unicode Consortium. Use of such characters will have limited interoperability. Software and fonts that support Unicode will not necessarily support characters assignments by other parties. Especially if the characters have unusual properties such as right-to-left characters, other implementations may treat those characters inappropriately.

Plane mapping tables

Unicode mapping tables
BMP		SMP	SIP		SSP
0000–0FFF	8000–8FFF	10000–10FFF	20000–20FFF	28000–28FFF	E0000–E0FFF
1000–1FFF	9000–9FFF		21000–21FFF	29000–29FFF
2000–2FFF	A000–AFFF	12000–12FFF	22000–22FFF	2A000–2AFFF
3000–3FFF	B000–BFFF		23000–23FFF
4000–4FFF	C000–CFFF	1D000–1DFFF	24000–24FFF	2F000–2FFFF
5000–5FFF	D000–DFFF		25000–25FFF
6000–6FFF	E000–EFFF		26000–26FFF
7000–7FFF	F000–FFFF		27000–27FFF