Talk:Unicode block
![]() | This article has not yet been rated on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||||
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
|
Move discussion in progress
There is a move discussion in progress which affects this page. Please participate at Talk:C0 Controls and Basic Latin - Requested move and not in this talk page section. Thank you. —RM bot 16:21, 22 May 2012 (UTC)
Unicode block test page
Unicode 3.2 test page is a web page that shows example characters from each unicode block.
217.41.116.114 (talk) 09:07, 30 August 2013 (UTC)
Consistency Question
The article says a contiguous block of code points starts with nnn0 and ends with nnnF -- Earlier nnnn was described as a sequence of decimal numbers. Should the code points start with hhh0 and end with hhhF, instead? Hmoulding (talk) 14:11, 29 August 2016 (UTC)
- I agree it should, mainly because nnnn is specifically used for decimal digits in the preceding section. I've updated the page. DRMcCreedy (talk) 16:06, 3 September 2016 (UTC)
Proposed moves of unicode block articles
A little more than half of the articles about unicode character blocks have the qualifier "(Unicode block)", the others don't. I propose to rename the latter so that all of them do. Here are some reasons:
- The current names are not descriptive of the subjects. A name like "Control Pictures", "Geometric Shapes", "Mathematical Operators", "Greek and Coptic" or even "Latin Extended-B" gives no clue that the topic of the article is a section of a specific computer character set.
- The current names are not specific to the subject of the article. Even names like "Greek Extended" or "Miscellaneous Symbols", that one could infer are about character encodings, could apply to other character sets, besides Unicode, that were in use in the rather recent past, and would still deserve articles of their own.
- The naming of the Unicode blocks is not consistent. The fact that only half of the articles have the "(Unicode block)" qualifier causes difficulties for editors and potential confusion for readers. For example, an editor intending to link to an article on the history and use of Braille system may link to Braille Patterns instead by mistake.
- The articles violate the Wikipedia standards for titles. Many of the unqualified Unicode block articles are in the plural, have unnecessary capitalizations, or violate the standards in other ways. For example, to satisfy the standards the article Mathematical Operators should be named Mathematical operator. But of course that is not the name of the Unicode block; and it is the name of an article with a very different subject. Adding the qualifier "(Unicode block)" would satisfy the naming standards, besides avoiding confusion.
- Those topics are not very notable and are only of specialized and ephemeral technical interest. The division of the Unicode character space into blocks is mostly an artifact of the way the Unicode consortium discusses, approves, and documents proposals to include characters. It has only tenuous (and often very questionable) connections to the history, usage, or semantics of those characters.
The division is relevant only to those who are interested in the history of Unicode, or who intend to propose new symbols for it.
The division is not relevant to users of Unicode. On the contrary, to find the Unicode for a desired glyph, like a special math symbol or a letter with a certain modifier, one should ignore the block division and use Google or some other generic search tool -- because one cannot tell which block that symbol has been put into.
The division is not even useful to font designers. While at some point one would find computer fonts that were limited to one or two specific blocks, that has never been a rule, and fonts are increasingly cutting across the Unicode block boundaries.
Apparently the names above were assigned without the "(Unicode block)" because it was felt that the qualifier was unnecessary, since there was no other page in Wikipedia with that name. But that is not what "unnecessary" means. Most of the names above have a common-sense meaning that has nothing to do with Unicode; so a qualifier is necessary to differentiate them from those common meanings. If you say "Geeometric Patterns", "Number Forms", or "Greek and Coptic" to someone, even to a computer expert, the last thing she will think of is the Unicode block of that name.
--Jorge Stolfi (talk) 23:01, 29 April 2019 (UTC)
These articles are OK
- Adlam (Unicode block)
- Aegean Numbers (Unicode block)
- Ahom (Unicode block)
- Alchemical Symbols (Unicode block)
- Anatolian Hieroglyphs (Unicode block)
- Ancient Greek Numbers (Unicode block)
- Arabic (Unicode block)
- Armenian (Unicode block)
- Arrows (Unicode block)
- Avestan (Unicode block)
- Balinese (Unicode block)
- Bamum (Unicode block)
- Basic Latin (Unicode block)
- Bassa Vah (Unicode block)
- Batak (Unicode block)
- Bengali (Unicode block)
- Bhaiksuki (Unicode block)
- Bopomofo (Unicode block)
- Brahmi (Unicode block)
- Buginese (Unicode block)
- Buhid (Unicode block)
- CJK Strokes (Unicode block)
- CJK Unified Ideographs (Unicode block)
- Carian (Unicode block)
- Caucasian Albanian (Unicode block)
- Chakma (Unicode block)
- Cham (Unicode block)
- Cherokee (Unicode block)
- Chess Symbols (Unicode block)
- Coptic (Unicode block)
- Cuneiform (Unicode block)
- Currency Symbols (Unicode block)
- Cypriot Syllabary (Unicode block)
- Cyrillic (Unicode block)
- Deseret (Unicode block)
- Devanagari (Unicode block)
- Dogra (Unicode block)
- Duployan (Unicode block)
- Egyptian Hieroglyphs (Unicode block)
- Elbasan (Unicode block)
- Elymaic (Unicode block)
- Emoticons (Unicode block)
- Ethiopic (Unicode block)
- Georgian (Unicode block)
- Glagolitic (Unicode block)
- Gothic (Unicode block)
- Grantha (Unicode block)
- Gujarati (Unicode block)
- Gunjala Gondi (Unicode block)
- Gurmukhi (Unicode block)
- Hangul Jamo (Unicode block)
- Hanifi Rohingya (Unicode block)
- Hanunoo (Unicode block)
- Hatran (Unicode block)
- Hebrew (Unicode block)
- Hiragana (Unicode block)
- Ideographic Description Characters (Unicode block)
- Imperial Aramaic (Unicode block)
- Indic Siyaq Numbers (Unicode block)
- Inscriptional Pahlavi (Unicode block)
- Inscriptional Parthian (Unicode block)
- Javanese (Unicode block)
- Kaithi (Unicode block)
- Kanbun (Unicode block)
- Kannada (Unicode block)
- Katakana (Unicode block)
- Kayah Li (Unicode block)
- Kharoshthi (Unicode block)
- Khmer (Unicode block)
- Khojki (Unicode block)
- Khudawadi (Unicode block)
- Lao (Unicode block)
- Latin-1 Supplement (Unicode block)
- Lepcha (Unicode block)
- Limbu (Unicode block)
- Linear A (Unicode block)
- Lisu (Unicode block)
- Lycian (Unicode block)
- Lydian (Unicode block)
- Mahajani (Unicode block)
- Mahjong Tiles (Unicode block)
- Makasar (Unicode block)
- Malayalam (Unicode block)
- Mandaic (Unicode block)
- Manichaean (Unicode block)
- Marchen (Unicode block)
- Masaram Gondi (Unicode block)
- Mayan Numerals (Unicode block)
- Medefaidrin (Unicode block)
- Meetei Mayek (Unicode block)
- Mende Kikakui (Unicode block)
- Meroitic Cursive (Unicode block)
- Meroitic Hieroglyphs (Unicode block)
- Miao (Unicode block)
- Modi (Unicode block)
- Mongolian (Unicode block)
- Mro (Unicode block)
- Multani (Unicode block)
- Musical Symbols (Unicode block)
- Myanmar (Unicode block)
- NKo (Unicode block)
- Nabataean (Unicode block)
- Nandinagari (Unicode block)
- New Tai Lue (Unicode block)
- Newa (Unicode block)
- Nushu (Unicode block)
- Nyiakeng Puachue Hmong (Unicode block)
- Ogham (Unicode block)
- Ol Chiki (Unicode block)
- Old Hungarian (Unicode block)
- Old Italic (Unicode block)
- Old North Arabian (Unicode block)
- Old Permic (Unicode block)
- Old Persian (Unicode block)
- Old Sogdian (Unicode block)
- Old South Arabian (Unicode block)
- Old Turkic (Unicode block)
- Optical Character Recognition (Unicode block)
- Oriya (Unicode block)
- Osage (Unicode block)
- Osmanya (Unicode block)
- Ottoman Siyaq Numbers (Unicode block)
- Pahawh Hmong (Unicode block)
- Palmyrene (Unicode block)
- Pau Cin Hau (Unicode block)
- Phags-pa (Unicode block)
- Phaistos Disc (Unicode block)
- Psalter Pahlavi (Unicode block)
- Rejang (Unicode block)
- Runic (Unicode block)
- Samaritan (Unicode block)
- Saurashtra (Unicode block)
- Sharada (Unicode block)
- Shavian (Unicode block)
- Siddham (Unicode block)
- Sogdian (Unicode block)
- Sora Sompeng (Unicode block)
- Soyombo (Unicode block)
- Specials (Unicode block)
- Sundanese (Unicode block)
- Superscripts and Subscripts (Unicode block)
- Sutton SignWriting (Unicode block)
- Syloti Nagri (Unicode block)
- Syriac (Unicode block)
- Tagalog (Unicode block)
- Tagbanwa (Unicode block)
- Tags (Unicode block)
- Tai Le (Unicode block)
- Tai Tham (Unicode block)
- Takri (Unicode block)
- Tamil (Unicode block)
- Tangut (Unicode block)
- Telugu (Unicode block)
- Thaana (Unicode block)
- Thai (Unicode block)
- Tibetan (Unicode block)
- Tifinagh (Unicode block)
- Tirhuta (Unicode block)
- Ugaritic (Unicode block)
- Vai (Unicode block)
- Variation Selectors (Unicode block)
- Wancho (Unicode block)
- Warang Citi (Unicode block)
- Zanabazar Square (Unicode block)
These articles need renaming
These articles need to be renamed with the qualifier "(Unicode block)". In some cases the there may be a redirect from unqualified to qualified name. In other cases the redirect is inappropriate or superfluous, and should be deleted after fixing all its uses. Note that the search window will supply the "(Unicode block)" anyway.
- Alphabetic Presentation Forms
- Ancient Greek Musical Notation
- Arabic Extended-A
- Arabic Mathematical Alphabetic Symbols
- Arabic Presentation Forms-A
- Arabic Supplement
- Bamum Supplement
- Block Elements
- Bopomofo Extended
- Braille Patterns
- Byzantine Musical Symbols
- CJK Compatibility Forms
- CJK Compatibility Ideographs Supplement
- CJK Compatibility Ideographs
- CJK Compatibility
- CJK Symbols and Punctuation
- CJK Unified Ideographs Extension A
- CJK Unified Ideographs Extension B
- CJK Unified Ideographs Extension C
- CJK Unified Ideographs Extension D
- CJK Unified Ideographs Extension E
- CJK Unified Ideographs Extension F
- Cherokee Supplement
- Combining Diacritical Marks Extended
- Combining Diacritical Marks Supplement
- Combining Diacritical Marks for Symbols
- Combining Diacritical Marks
- Combining Half Marks
- Common Indic Number Forms
- Control Pictures
- Coptic Epact Numbers
- Cuneiform Numbers and Punctuation
- Cyrillic Extended-A
- Cyrillic Extended-B
- Cyrillic Extended-C
- Cyrillic Supplement
- Devanagari Extended
- Domino Tiles
- Early Dynastic Cuneiform
- Egyptian Hieroglyph Format Controls
- Enclosed Alphanumeric Supplement
- Enclosed Alphanumerics
- Enclosed CJK Letters and Months
- Enclosed Ideographic Supplement
- Ethiopic Extended-A
- Ethiopic Extended
- Ethiopic Supplement
- General Punctuation
- Geometric Shapes Extended
- Geometric Shapes
- Georgian Extended
- Georgian Supplement
- Glagolitic Supplement
- Greek Extended
- Greek and Coptic
- Hangul Jamo Extended-A
- Hangul Jamo Extended-B
- Hangul Syllables
- IPA Extensions
- Ideographic Symbols and Punctuation
- Kana Extended-A
- Kana Supplement
- Katakana Phonetic Extensions
- Khmer Symbols
- Latin Extended Additional
- Latin Extended-A
- Latin Extended-B
- Latin Extended-C
- Latin Extended-D
- Latin Extended-E
- Letterlike Symbols
- Linear B Ideograms
- Linear B Syllabary
- Mathematical Alphanumeric Symbols
- Mathematical Operators
- Meetei Mayek Extensions
- Miscellaneous Symbols and Pictographs
- Miscellaneous Symbols
- Miscellaneous Technical
- Modifier Tone Letters
- Mongolian Supplement
- Myanmar Extended-A
- Myanmar Extended-B
- Number Forms
- Ornamental Dingbats
- Phonetic Extensions Supplement
- Phonetic Extensions
- Rumi Numeral Symbols
- Shorthand Format Controls
- Sinhala Archaic Numbers
- Small Kana Extension
- Spacing Modifier Letters
- Sundanese Supplement
- Supplemental Arrows-C
- Supplemental Punctuation
- Supplemental Symbols and Pictographs
- Symbols and Pictographs Extended-A
- Syriac Supplement
- Tai Viet
- Tamil Supplement
- Tangut Components
- Transport and Map Symbols
- Unified Canadian Aboriginal Syllabics Extended
- Variation Selectors Supplement
- Vedic Extensions
- Vertical Forms
- Yi Radicals
- Yi Syllables
Already renamed but improper redirects
These articles already have the "(Unicode block)" qualifier, but are accessed through an unqualified redirect. The redirect is inappropriate or superfluous, and should be deleted after fixing all its uses. Note that the search window will supply the "(Unicode block)" anyway.
- Arabic Presentation Forms-B
- Ancient Symbols
- Yijing Hexagram Symbols
- Box Drawing
- CJK Radicals Supplement
- Counting Rod Numerals
- Miscellaneous Mathematical Symbols-A
- Miscellaneous Mathematical Symbols-B
- Small Form Variants
- Supplemental Arrows-A
- Supplemental Arrows-B
- Supplemental Mathematical Operators
- Miscellaneous Symbols and Arrows
Improperly merged articles
These articles were improperly merged into general articles about languages, scripts, etc. It is OK to have a section in those articles that shows the relevant Unicode characters. However, any additional information about the block --susch as the history of how the character was placed into Unicode -- must go into a separate "(Unicode block)" article. So these merges should be undone, leaving behind the bare table of characters.
- Phoenician (Unicode block) --- Links to section of Phoenician language. Undo the merge.
- Taixuanjing --- Links to article on divination system. Undo the merge.
- Kangxi radical#Unicode -- Undo merge.
- Dingbat#Unicode -- Undo merge
- Halfwidth and fullwidth forms#In_Unicode -- Split into general and Unicode.
Special cases
These articles need special handling, or different renaming action:
- Private Use Areas -- move to Private Use Areas (Unicode) and delete this redirect.
- General Category -- move to General Category (Unicode) and delete this redirect.
- Playing cards in Unicode -- May need splitting into "~ (Unicode block)" and "~ in Unicode"