跳转到内容

Unicode數字

维基百科,自由的百科全书

这是本页的一个历史版本,由Yhz1221留言 | 贡献2013年2月18日 (一) 07:57 (添加{{uncategorized}}标记到条目 (TW))编辑。这可能和当前版本存在着巨大的差异。

数字符号(又称Unicode数字)表示一个数字的字符和字符序列。相同的阿拉伯文数字广泛用于全球的多种书写系统,它们使用相同的表数语义,但是在不同的书写系统之间,表示这些数字的字母存在较大差异。为了支持这些字母的差异,Unicode将这些数字编码包含在许多脚本块之中。小数在23个独立的块中重复:在阿拉伯مرات باللغة العربية中出现2次。六个附加的块还以富文本形式包含了数字,其主要是作为一个专门的数字使用的字母面板。除了各种阿拉伯语数字外,Unicode还包含了一下不常见的数字,例如:爱琴海数字、罗马数字、计数杆数字、楔形文字数字和古希腊数字。

数字符号总是会涉及字形的合成,有限数量的字符组合成为其他的数字符号。例如在阿拉伯语数字中9-9-0序列组成了数字九百九十(990)。在罗马数字中,相同的数字表示为数字符号Ⅹↀ或ⅩⅯ。它们是表示相同抽象数字的不同数字符号。数字符号的语义在其特定的组合中是不同的。阿拉伯语小数是位置-值的组合,而罗马数字是符号-值的组合并且根据其组合进行增减。

数字符号的数字属性

按照在文本中使用的数字属性进行分组,Unicode拥有四个数值类型值。首先是“非数字(not a number)”类型。其次是十进制底数数字符号,通常用于西式数字(普通0-9),也可以是非十进制系统中的数字,例如罗马数字,和经过排版的十进制数,例如环绕数字。

数字类型(Unicode字符属性)
数字类型 代码 拥有数字值 示例 备注
非数字 None
  • A
  • X (拉丁与)
  • !
  • Д
  • μ
数字值="NaN"
十进制数字 De
  • 0
  • 1
  • 9
  • ६ (梵文字母6)
  • ೬ (坎那达语6)
  • 𝟨 (数学符号,无衬线字体)
直接数字(十进制底数)。对应两种方式,通用类型=Nd
十进制表意文字 Di
  • ¹ (上标)
  • ⒈ (包含句号的数字)
十进制,但经过排版
数字 Nu
  • ¾
  • ௰ (泰米尔语数字十)
  • Ⅹ (罗马数字符号)
  • 六 (汉语数字6)
  • 壹 (汉语,用于会计时的值1)
数字值,但并非十进制底数

十六进制数

Unicode中的十六进制数并非单独的字符,而是使用现有的字母和数字。这些字符的字符属性被标为Hex_digit=Yes,某些字符还标记了ASCII_Hex_digit=Yes。

标记了Hex_Digit=Yes的Unicode字符
0123456789ABCDEF 基本拉丁文、大写 同时ASCII_Hex_Digit=Yes
0123456789abcdef 基本拉丁文、小写 同时ASCII_Hex_Digit=Yes
0123456789ABCDEF 全角形式、大写
0123456789abcdef 全角形式、小写

不同脚本的数字符号

阿拉不要数字

The Arabic-Indic numerals involve ten digits (for base ten; 0–9 ) and a decimal separator that can be combined into composite numerals representing any rational number. Unicode includes these ten digits in the Basic Latin (or ASCII derived) block. Unicode has no decimal separator for common unified use. The Arabic script includes an Arabic specific decimal separator (U+066B). Other writing systems are to use whatever punctuation produces the appropriate glyph for the locale: for example ‘Full Stop’ (U+002E period) in United States usage and Comma (U+002C) in many other locales.

The Arabic-Indic digits are repeated in several other scripts: Arabic, Balinese, Bengali, Devanagari, Ethiopic, Gujarati, Gurmukhi, Telugu, Khmer, Lao, Limbu, Malayalam, Mongolian, Myanmar, New Tai Lue, Nko, Oriya, Telugu, Thai, Tibetan, Osmanya. Unicode includes a numeric value property for each digit to assist in collation and other text processing operations. However, there is no mapping between the various related Arabic-Indic digits.

小数

The fraction slash character (U+2044) allows authors using Unicode to compose any arbitrary fraction along with the decimal digits. Unicode also includes a handful of vulgar fractions as compatibility characters, but discourages their use.

十进制小数

Several characters in Unicode can serve as a decimal separator depending on the locale. Decimal fractions are represented in text as a sequence of decimal digit numerals with a decimal separator separating the whole-number portion from the fractional portion. For example, the decimal fraction for “¼” is expressed as zero-point-two-five (“0.25”). Unicode has no dedicated general decimal separator but unifies the decimal separator function with other punctuation characters. So the “.” used in “0.25” is the same period character used to end the sentence. However cultures vary in the glyph or grapheme used for a decimal separator. So in some locales, the comma may be used instead ”0,25”. Still other locales use a space for “0 25”. The Arabic writing system includes a dedicated character for a decimal separator that looks much like a comma ”٫” (U+066B) which when combined with the Arabic graphemes for the Arabic-Indic decimal digits to express one-quarter appears as: “٠٫٢٥”.

Note that although Arabic is written from right to left, while English is written left to right, in both languages numbers are wrtten with the most significant digit on the left and the least significant on the right.

无理数、集合和其他常量的字符

As stated above, the ten decimal digits, decimal separator and fraction slash are limited to representing rational numbers. Irrational numbers would require composition of infinite character sequences and so irrational numbers and other related constructs must be represented with other characters. In principle, Unicode does not yet encode characters to solely denote these numbers. For example, although Unicode 1.1 includes a character for “natural exponent’ ℯ (U+212F) its UCS canonical name derives from its glyph: “Script Small E”. As exceptions to this general rule, Unicode does include three characters canonically named for the number they represent: Plancks constant ℎ (U+210E), the reduced Planck constant ℏ (U+210F), and Eulers constant ℇ (U+2107). These characters are all given canonical names by the UCS for the number they semantically represent. They are not necessarily irrational number though, in practical terms, they would be exceedingly difficult to represent through composition of decimal digits. Representation of other irrational number and math constants is achieved through borrowing characters from other writing systems: for example using π from the Greek script (U+03C0) to signify the irrational number that is the ratio of the circumference to the diameter of a perfect circle.

富文本和其他兼容数字符号

The Arabic-Indic numerals also appear among the compatibility characters as rich text variant forms including bold, double-struck, monospace, sans-serif and sans-serif bold. and fullwidth variants for legacy vertical text support.

Rich text parenthesized, circled and other variants are also included in the blocks: Enclosed CJK Letters and Months; Enclosed Alphanumerics, Superscripts and Subscripts; Number Forms; and Dingbats.

中日韩花码数字符号 numerals

The huāmǎ system is a variation of the rod numeral system. Rod numerals are closely related to the counting rods and the abacus, which is why the numeric symbols for 1, 2, 3, 6, 7 and 8 in the huāmǎ system are represented in a similar way as on the abacus. Nowadays, the huāmǎ system is only used for displaying prices in Chinese markets or on traditional handwritten invoices.

Unicode中的花码数字符号

According to the Unicode standard version 3.0, these characters are called Hangzhou style numerals. This indicates that it is not used only by Cantonese in Hong Kong. In the Unicode standard 4.0, an erratum was added which stated:

The digits of the Suzhou numerals are designated in the CJK Symbols and Punctuation block between U+3021 and U+3029, U+3007, U+5341, U+5344, and. U+5345.

日语和韩语数字符号


古希腊数字

Unicode provides support for several variants of Greek numerals, assigned to the Supplementary Multilingual Plane from U+10140 through U+1018F.[1]

Attic numerals were used by ancient Greeks, possibly from the 7th century BC. They were also known as Herodianic numerals because they were first described in a 2nd century manuscript by Herodian. They are also known as acrophonic numerals because all of the symbols used derive from the first letters of the words that the symbols represent: 'one', 'five', 'ten', 'hundred', 'thousand' and 'ten thousand'. See Greek numerals and acrophony.

Decimal Symbol Greek numeral
1 Ι ἴος (ios)
5 Π πέντε (pente)
10 Δ δέκα (deka)
100 Η ἑκατόν (hekaton)
1000 Χ χίλιοι (khilioi)
10000 Μ μύριοι (myrioi)

罗马数字

Roman numerals are a numeral system originating in ancient Rome, adapted from Etruscan numerals. The system used in classical antiquity was slightly modified in the Middle Ages to produce the system we use today. It is based on certain letters which are given values as numerals.

Roman numerals are commonly used today in numbered lists (in outline format), clockfaces, pages preceding the main body of a book, chord triads in music analysis (Roman numeral analysis), the numbering of movie and video game sequels, book publication dates, successive political leaders or children with identical names, and the numbering of some sport events, such as the Olympic Games or the Super Bowl.

Unicode中的罗马数字

Unicode has a number of characters specifically designated as Roman numerals, as part of the Number Forms[1] range from U+2160 to U+2188. This range includes both upper- and lowercase numerals, as well as pre-combined characters for numbers up to 12 ( or [2]). One reason for the existence of pre-combined numbers is to facilitate the setting of multiple-letter numbers (such as VIII) in a single "square" in Asian vertical text. Another reason is for 12-hour clock-face use.

Additionally, characters exist for archaic[1] forms of 1000, 5000, 10,000, large reversed C (Ɔ), late 6 (, similar to Greek Stigma: Ϛ), early 50 (, similar to down arrow ↓⫝⊥[3]), 50,000, and 100,000. Note that the small reversed c, is not intended to be used in Roman numerals, but as lower case Claudian letter ,

Table of Roman numerals in Unicode
Code x= 0 1 2 3 4 5 6 7 8 9 A B C D E F
Value[4] 1 2 3 4 5 6 7 8 9 10 11 12 50 100 500 1,000
U+216x
U+217x
Value 1000 5000 10,000 6 50 50,000 100,000
U+218x

The characters in the range U+2160–217F are present only for compatibility with other character set standards which provide these characters. For ordinary uses, the standard Latin letters are preferred.[來源請求] Displaying these characters requires a program that can handle Unicode and a font that contains appropriate glyphs for them.

If using blackletter or script typefaces, Roman numerals are set in Roman type. Such typefaces may contain Roman numerals matching the style of the typeface in the Unicode range U+2160–217F; if they don't exist, a matching Antiqua typeface is used for Roman numerals.

计数棒数字符号

Value 0 1 2 3 4 5 6 7 8 9
Vertical
Horizontal

The vertical rods are usually for even powers of ten (1, 100, 10000...) and the horizontal for odd powers (10, 1000...). For example 126 is represented by instead of , which could be confused with 36. Historically, red rods were used for positive numbers and black rods for negative numbers.

Unicode中的计数棒数字符号

Counting rod numerals are included in their own block in the Supplementary Multilingual Plane (SMP) from U+1D360 to U+1D37F. Eighteen characters for vertical and horizontal digits of 1-9 are included as of Unicode 5.0, though vertical and horizontal are opposite from the description above. Fourteen code points reserved for future use. Zero should be represented by U+3007 (〇, ideographic number zero) and the negative sign should be represented by U+20E5 (combining reverse solidus overlay).[5] As these were recently added to the character set and since they are included in the SMP, font support may still be limited.

算籌數字
Counting Rod Numerals[1][2]
Unicode Consortium 官方碼表(PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1D36x 𝍠 𝍡 𝍢 𝍣 𝍤 𝍥 𝍦 𝍧 𝍨 𝍩 𝍪 𝍫 𝍬 𝍭 𝍮 𝍯
U+1D37x 𝍰 𝍱 𝍲 𝍳 𝍴 𝍵 𝍶 𝍷 𝍸
註釋
1.^ 依据 Unicode 14.0
2.^ 灰色區域表示未分配的碼位

引用

  1. ^ 1.0 1.1 1.2 Unicode Charts: Ancient Greek Numbers 引用错误:带有name属性“UnicodeChart”的<ref>标签用不同内容定义了多次
  2. ^ XII
  3. ^ David J. Perry: Proposal to Add Additional Ancient Roman Characters to UCS
  4. ^ For the first two rows
  5. ^ The Unicode Standard, Version 5.0 – Electronic edition (PDF), Unicode, Inc.: 499–500, 2006