Wikipedia:WikiProject Typography/Unicode
This page is an effort at documenting/standardizing presentation of Unicode-related information on Wikipedia.
Templates
Unicode-related templates.
- {{SpecialChars}}
- Add a small message box (floated right) which informs the reader that the page uses special characters, which might not display properly. Here, "special" basically means anything beyond ASCII and maybe Latin-1. This template should be added to the top of any page that makes extensive use of Unicode.
- {{Unicode}}
- This just wraps the given character(s) in an HTML SPAN block with class "Unicode". CSS can then be applied on a per-browser/platform basis to select appropriate fonts, or maybe even do other fix-ups.
Glyph images
Wikipedia and/or Wikimedia Commons host many images of glyphs — characters rendered in a given font. In article text, we generally prefer to use literal Unicode characters, not these rendered images. Thus, these images are primarily used in #Unicode tables which provide both the literal character and an image of the character.
Ideally, all such glyph images would be vector graphics, in SVG format. However, many exist in a raster graphics format, such as GIF. Converting or replacing these with SVGs is something that should be done.
As of this writing, there is no standardized naming of these images. Sometimes an expression of the codepoint is used as the file name, e.g., U+2122.svg
. In other cases, the character name is used, e.g., OCR-A char Quotation Mark.svg
.
Unicode tables
Many articles dealing with Unicode include tables of Unicode characters. The standard form for such tables is as given in the following example:
Char | Image | Name | Hex | Decimal |
---|---|---|---|---|
☷ | Trigram for Earth | 2637 | 9783 | |
☸ | Wheel of Dharma | 2638 | 9784 | |
☹ | White frowning face (Emoticon) | 2639 | 9785 | |
☺ | White smiling face (Emoticon) | 263a | 9786 |
The table format has the following design features:
- Sortable
- "Char" column
- "Image" column
- "Name" column
- The official codepoint name, as specified by the Unicode Consortium
- Either the entire name, or individual words, may be wikilinked to articles
- When the appropriate article title does not match the word(s) of the official name, piped links should be used to preserve the official name
- Additional names or references can be provided in parenthesis, if needed
- For illustration, in the above table:
- Only "Trigram" is wikilinked, because "of Earth" is not part of ba gua (concept)
- All of "Wheel of Dharma" is wikilinked, because Dharmacakra is synonymous with "Wheel of Dharma"
- "Emoticon" is a parenthetical, as that is not part of the official Unicode codepoint name
- "Decimal" and "Hex" columns
- The codepoint number, in both decimal (base ten) and hexidecimal (base 16) formats
- Syntax is omitted (e.g., “
U+
”, “&u
”, “0x
”, etc.) - Such syntax is specific to the usage (Unicode specification, HTML, Perl, etc.); it is not a universal form. By omitting it, anyone can copy-and-paste the actual value, and add whatever syntax they need (or none at all).
- Such prefixes also prevent table columns from sorting properly
- The plan is to eventually add some kind of standard explanation of the columns to the tables, most likely as an adjacent template, or maybe links from the headers. Ideas welcome!