Jump to content

HTML decimal character rendering

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Mjb (talk | contribs) at 17:53, 19 October 2005 (Illegal characters: needed some additional wikilinking after being split from List of HTML decimal character references). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Not all web browsers or email clients used by receivers of HTML documents, or text editors used by authors of HTML documents, will be able to render all HTML characters. For example, Mozilla Firefox 1.x versions display many more characters than the latest versions of Microsoft Internet Explorer. This is due to different "font linking" capabilities that allow glyphs to be used from fonts according to what characters are needed and supported by the fonts on the system.

For codes from 0 to 127, the original 7-bit ASCII standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using character entity names. Any higher numbered codes can only be created using the decimal number character reference.

Illegal characters

HTML forbids the use of the characters with Universal Character Set/Unicode code points

  • 0000–0008
  • 0011
  • 0014–0031
  • 0128–0159

These characters are not even allowed by reference. That is, you are not even allowed to write them as numeric character references. However, references to characters 0128–0159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to bytes 128–159 (decimal) in the Windows-1252 character encoding. This is in violation of HTML and SGML standards, and the characters are already assigned to higher code points, so HTML document authors should always use the higher code points. For example, for the trademark sign (™), use ™, not ™.

The characters 0009 (tab), 0010 (linefeed), 0012 (form feed), and 0013 (carriage return) are allowed in HTML documents, but, along with 0032 (space) are all considered "white space", and, except in a <pre> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.

See also