Jump to content

Talk:Unicode and HTML

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Mzajac (talk | contribs) at 00:36, 20 January 2005 (Unicode display in Windows MSIE: added discussion heading). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

For the Unicode character charts, reverted the URL from http://www.unicode.org/charts/normalization/ back to http://www.unicode.org/charts/. The normalization charts only display the characters if you have the font already installed and do not seem to be as complete as the full charts available on the other URL. --Nate 15:37 Mar 7, 2003 (UTC)


gg

Hmm, I wasn't sure whether it was really more "intuitive" to use decimal instead of hexadecimal. I mean, both alternatives obscure the character, and the argument that "older web browsers" fail to parse hexadecimal is moot IMO since those web browsers will have a problem with non-8-bit characters anyway. I felt NPOV would dictate to say that in HTML, both hexadecimal and decimal can be used, so I changed that. Djmutex 10:51 May 2, 2003 (UTC)


The intro says: HTML 4.0 uses Unicode as its official character set.

Does somebody have a link to the place in the specification where this is stated? -- Hirzel

Section 5.1. The use of the term "character set" is misleading because it is so overloaded, but it is accurate. An HTML document must consist of Unicode characters. Those characters are, in turn, encoded (as iso-8859-1, utf-8, etc.). Today I added some text to the article to clarify this point. - mjb 23:48, 19 Aug 2004 (UTC)
I believe that statement is wrong. If I understand http://www.w3.org/TR/html401/charset.html#encodings correctly, it explicitly says that there is no "default" encoding. Instead, "conforming user agents" must be able to map any HTML document to Unicode (for example, to support all defined HTML named entities), and may apply heuristics on the HTML document if no charset is specified explicitly (either in the HTTP header, or a META tag, or a "charset" attribute on a specific element.) -- djmutex 21:20 10 Jun 2003 (UTC)
Re myself, the statement isn't really wrong, but it might be misleading. All Unicode characters must be supported by a browser, but there is no "default" character set. As a result, Unicode is the "official" character set, but it's not the default. djmutex 21:23 10 Jun 2003 (UTC)
Well, sorta, but not really. There is confusion resulting from the unfortunate use of the overloaded term "character set" and your apparent misunderstanding that Unicode is itself an encoding in the same sense as a "charset" (it's not).
HTML documents are indeed comprised of Unicode characters, always, but Unicode characters are abstract concepts: just "the idea of" a unit in a writing system, mapped to a similarly abstract concept of a non-negative integer number, its "code point". An HTML or XML document is defined as being a sequence of such characters, and therefore is itself an abstract entity. It is only when the document manifests as a sequence of bits/bytes on the network, on disk, or in memory that it has an encoding associated with it. The encoding maps the characters/code points to sequences of bits. You're right that there is no default encoding, at least in the HTML spec itself, but depending on how the document is transmitted, there may be a default of us-ascii or iso-8859-1 (RFCs 2616 and 3023 address this topic). I've modified the article somewhat to explain this without going into too much detail; there are articles devoted to these topics and we don't need to repeat them in depth here. -- mjb 23:48, 19 Aug 2004 (UTC)

Unicode display in Windows MSIE

Some multilingual web browsers that dynamically merge the required font sets on demand, e.g., Microsoft's Internet Explorer 5.0 and up on Windows, or Mozilla/Netscape 6 and up cross-platform, are capable of displaying all the Unicode characters on this page simultaneously after the appropriate "text display support packs" are downloaded. MSIE 5.5 would prompt the users if a new font were needed via its "install on demand" feature.

All of the characters in the table display correctly on my Mac's Safari and Firefox (thanks partly to Code2000 and Code2001 fonts). But my stock Windows XP installation doesn't show the last six letters in MSIE 6.0 or Firefox 1.0, and doesn't prompt me to do anything. Is the above passage incorrect, or is there something wrong with my Windows or Explorer? Michael Z. 00:35, 2005 Jan 20 (UTC)