Unicode
HTML 4.0 採用 Unicode 作為標準字集,利用特定的數字代碼,我們可以在各種支援HTML 4.0 標準的瀏覽器上顯示任何地區文字的網頁,只要電腦本身安裝有合適的字型檔就行。
過去電腦編碼的8位標準,使每個國家都只可以按國家使用的字符而編定各自的編碼系統;而對於部份字符系統比較複雜的語言,如越南語,又或者東亞國家的大型字符集,都不能在8位的環境下好好顯示。連自己的語言也未必可以好好顯示的話,就更惶論顯示其他國家的文字了。然而,現在我們可以利用 HTML 4.0 的新命令,利用&#
nnn;
的格式顯示特定的字符。nnn代表該字符的十進位Unicode代碼。如果想採用十六進位代碼的話,要在編碼之前加上x
字符。
- 以下的請幫忙翻譯
The support for hexadecimal in this context is more recent, so older browsers might have problems displaying those characters – but they will probably have a problem displaying Unicode characters outside the 8-bit range in the first place. It is still a common practice to convert the hexadecimal code point into a decimal value (e.g. ♠
instead of ♠
).
In the Unicode standard, each code point is expressed in the notation U+hhhh
, where hhhh are the hexadecimal digits.
There is also a standard set of named character entity references for commonly used symbols outside of some character encodings, so one can use —
, for example, to represent an em dash—like this—in text even if the character encoding used doesn't contain that character.
Many browsers, though, are only capable of displaying a small subset of the full UCS-2 repertoire. Here is how your browser displays various Unicode code points:
Code | Description | What your browser displays |
A | Latin capital letter "A" | A |
ß | Latin small letter "Sharp S" | ß |
þ | Latin small letter "Thorn" | þ |
Δ | Greek capital letter "Delta" | Δ |
Й | Cyrillic capital letter "Short I" | Й |
ק | Hebrew letter "Qof" | ק |
م | Arabic letter "Meem" | م |
๗ | Thai digit 7 | ๗ |
ቐ | Ethiopic syllable "Qha" | ቐ |
あ | Japanese Hiragana "A" | あ |
叶 | Simplified Chinese "Leaf" | 叶 |
葉 | Traditional Chinese "Leaf" | 葉 |
냻 | Korean Hangul syllable "Nieun Yae Rieulhieuh" | 냻 |
Some multilingual web browsers that dynamically merge the required font sets on demand, e.g., Microsoft's Internet Explorer 5.5 on Windows, or Mozilla/Netscape 6 cross-platform, are capable of displaying all the Unicode characters on this page simultaneously after the appropriate "text display support packs" are downloaded. MSIE 5.5 would prompt the users if a new font were needed via its "install on demand" feature. Other browsers such as Netscape Navigator 4.77 can only display text supported by the current font associated with the character encoding of the page. When you are using the latter type of browser, it is unlikely that your computer has all of those fonts, nor the browser can use all available fonts on the same page. As a result, the browser will not display the text above all correctly, though it may display a subset of them. Because they are encoded according to the standard, though, they will display correctly on any system that is compliant and does have the characters available. Further, those characters given names for use in named entity references are likely to be more commonly available than others.
- Unicode 1-50
- Unicode 51-75
- Table of Unicode characters, 128 to 999
- Table of Unicode characters, 1000 to 1999
- wikipedia:Unicode and wikipedia.
外部連結
- Alan Wood’s Unicode Resources - Unicode fonts and information (www.alanwood.net/unicode).
- http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm The International Phonetic Alphabet in Unicode
- http://www.alanwood.net/unicode/cjk_compatibility_ideographs.html CJK Compatibility Ideographs
- http://www.unicode.org/charts/ Unicode character charts