跳至內容

Unicode

Wikipedia (chū-iû ê pek-kho-choân-su) beh kā lí kóng...

Oē-thâu

Thong-iōng-bé (hàn-jī: 通用碼), Bān-kok-bé (hàn-jī: 萬國碼) sī 1-chióng pian-bé piau-chún. Eng-gí kiò-chòe Unicode, sī Uni kap code chit 2-jī cho·-ha̍p--khí-lâi-ê. Uni chit-jī ū "thong-iōng" ê ì-sù. Code chit-jī sī "bé" ê ì-sù. Thong-iōng-bé ê 1-ê tiōng-iàu ê lí-liām sī beh siat-kè 1-thò ē-sài chhú-lí sè-kài kok-chióng bûn-jī ê pian-bé. Tâi-oân Hoâ-gí hoan-chòe Wanguoma (萬國碼), Bān-kok-bé sī chioh-sû.

Kán-tan kóng, thong-iōng-bé sī 1-ê kok-chè piau-chún. I ê bo̍k-piau sī kā chhú-lí sè-kài kok-chióng gí-giân ê bûn-jī ê jī-tô· chòe-pian-bé. Kā múi 1-ê jī-tô· tùi-èng kàu 1-ê chéng-sò·. Chit-ê chéng-sò· kiò-chòe chit-ê jī-tô· ê bé-ūi. Án-ne ē-sài kā bûn-jī choán-hoà choè sò·-jī, chiah ū-hoat-tō iōng tiān-náu chhú-lí kah pó-chûn.

Thong-iōng-bé ū chi̍t-koá ki-su̍t siōng ê hān-chè kap būn-tê. Mā-ū chi̍t-kóa phoe-phêng. M̄-koh, thong-iōng-bé chiām-chiām piàn-chòe nńg-thé kok-chè-hòa kap nńg-thé to-gí-giân khoân-kéng chit 2-hāng sū-kang siōng chú-liû ê pian-bé. Microsoft Windows NT kap āu-lâi ê Microsoft Windows 2000, Microsoft Windows XP iōng UTF-16 lâi pó-chûn hē-thóng lāi-pō· iōng ê bûn-jī. UNIX-lūi ê hē-thóng, chhiūⁿ Linux, BSD (OpenBSD, FreeBSD) kap Mac OS X iōng UTF-8 lâi piáu-hiān to-gí-giân ê bûn-jī.

Tōng-ki

Chá-kî tiān-náu iōng ê pian-bé chú-iàu chiam-tùi Eng-gí lâi siat-kè. Ka-na sek-ha̍p chhú-lí Eng-bûn. Āu-lâi chiām-chiām cheng-ka Europa kî-tha chú-iàu gí-giân iōng-ê jī-bó. M̄-koh, bô-kâng kok-ka só· su-iàu kap cheng-ka ê jī-bó lóng bô-kâng. Kiat-kó sī chhut-hiān chin-chē bô-hoat-tō· sio kau-thong ê pian-bé. Iōng Hoat-gí pian-bé hē-thóng pó-chûn ê chu-liāu, nā iōng Tek-gí pian-bé hē-thóng lâi tha̍k kap chhú-lí ē têng-tâⁿ-khì. Chiam-tùi 1-chióng gí-giân ê pian-bé hē-thóng siat-kè ê nńg-thé ka-na ē-sài chhú-lí hit-chióng gí-giân. Beh kā chit-ê nńg-thé kái-kah ē-sài chhú-lí pa̍t-chióng gí-giân sī chin hùi-khì ê tāi-chì. Beh iōng tiān-náu chhú-lí 1-chóng í-siōng ê gí-giân ē-sài kóng chin khùn-lân. Nā sī khó-lū sè-kài khî-thaⁿ ê gí-giân kap bûn-jī, chit-ê būn-tê ka-na ē lú-lâi lú siong-tiōng.

Kóng tò-tńg-lâi, nā-sī ū 1-thò pian-bé ē-sài chhú-lí sè-kài kok-chióng bûn-jī. Bô kâng gí-giân ê chu-liāu kau-thong piàn kán-tan. Tông-sî chhú-lí to-gí-giân mā piàn kán-tan. Nā-sī 1-thò nńg-thé lī-iōng chit chióng pian-bé lâi siat-kè, chit-ê nńg-thé, tiō sǹg-kóng khai-sí sī chiam-tùi bó· 1-chóng gí-giân lâi siat-kè, mā ē-sài khah kán-tan tio̍h kái lâi chi-oān pa̍t-chóng gí-giân kap bûn-jī. Chia-ê hó ē-sài kóng sī chá-kî khai-sí thui-sak thong-iōng-bé ê tōng-ki.

Cho·-hó-ê jī-bó kap Cho·-ha̍p-ê jī-bó

Ūi-tio̍h beh tī iú-hān ê pian-bé khong-kan lāi-tè chi-goân lú-chē lú-hó ê bûn-jī, thong-iōng-bé sú-iōng cho·-hap-ê jī-bó ê chò-hoat. Iōng á chit-ê jī chò-lē. Thong-iōng-bé ū hō chit-ê jī ka-kī 1-ê bé-ūi (U+00E1). M̄-koh, lán m̄a ē-sài siūⁿ-kóng chit-ê jī sī a (bé-ūi U+0061) kap ˊ lâi cho·-hap-ê. Tī thong-iōng-bé ū tēng-gī 1-ê cho·-hap-iōng (combining) ê ˊ (bé-ūi U+0301). N̄a-sī chhut-hiàn U+0061 U+0301 chit 2-ê sò·-jī sio-liân, lán tio̍h ài liáu-kái che sī ài kà thâu-chêng U+0061 tāi-piáu ê a kap aū-piah U+0301 tāi-piáu ê ˊ, cho·-ha̍p-choè á. Iōng á (U+00E1) 1-ê sò·-jī lâi piáu-sī lô-má-jī jī-bó ê á, chit-chióng kiò cho·-hó-ê jī-bó. (precomposed character). Iōng U+0061 U+0301 lâi piáu-sī, chit-chióng kiò cho·-ha̍p-ê jī-bó (composed character). Chhiū U+0301 chit-chióng ê, chiò cho·-ha̍p-iōng jī-bó (combining character).

1-ê ki-chhò jī-bó (base character) aū-piah ē-sài chiap 1-ê í-siōng ê cho·-ha̍p-iōng jī-bó, hêng-sêng 1-ê cho·-ha̍p-ê jī-bó. Nā sī chia-ê cho·-ha̍p-ê jī-bó lóng beh kái chòe cho·-hó-ê jī-bó, ū ka-kī ê bé-ūi. Ān-ne ē su-iàu iōng tiāu chin-chē bé-ūi, in-ùi cho·-ha̍p ê khó-lêng-sèng ū chin-chē. M̄-koh, chá-chêng ê kî-thaⁿ pian-bé it-poaⁿ bô iōng cho·-ha̍p. Ūi-tio̍h piāⁿ-lī chú-lí ka iōng kū pian-bé ê chu-liāu kau-thong, Europa ê chú-iàu gí-giân iōng ê bûn-jī ê jī-bó, i-poaⁿ tī thong-iōng-bé lāi lóng-ū tùi-èng ê cho·-ha̍p-hó ê jī-bó. Ūi-tio̍h chiàu-kò 1-ê jī-bó khó-lêng ū 1-ê í-siōng ê piáu-sī-hoat (cho·-hó-ê kap cho·-ha̍p-ê). Thong-iōng-bé ū khu-tēng 2-ê piáu-sī án-chòaⁿ sèng kâng-ì (sio-siâng), chit-ê hoat-chek kiò Canonical equivalence.

Hián-sī ê būn-tê

Beh chéng-khak hián-sī cho·-ha̍p ê jī-bó sū-iàu khak ho̍k-cha̍p ê jī-hêng hián-sī ki-su̍t. Chiah-ê ki-su̍t m̄-sī thong-iōng-bé piau-chún ê 1-pō·-hūn. Chin put-hēng--ê sī, in-ùi tōa-pō·-hūn ê bûn-jī kan-na su-iàu iōng cho·-hó-ê jī-bó, tiān-náu nńg-té tùi cho·-ha̍p jī-bó ê chi-oān kaù-taⁿ iû-oân bô-kaù-hó. Ū-hoat-tō· chéng-khak hián-sī cho·-ha̍p jī-bó ê ki-chân jī-hêng ki-su̍t ū OpenType (Adobe System kap Microsoft chè-tēng), AAT (Apple Computer chè-tēng), kap Graphite (SIL International chè-tēng). M̄-koh, tōa-pō·-hūn ê nńg-thé bô khì lī-iōng chiah-ê jī-hêng ki-su̍t, tōa-pō·-hūn ê jī-hêng mā bô chi-oān, só·-í bô hoat-tō· chéng-khak hián-sī cho·-ha̍p ê jī-bó. Pí-lūn chiū "Pe̍h-oē-jī" lâi kóng, chin-chē Pe̍h-oē-jī jī-bó m̄-sī tāi-seng to̍h í-keng chó·-ha̍p hó-sè ê thong-iōng-bé, iā-to̍h-sī kóng it-tēng ài-iōng cho·-ha̍p ê hong-sek. Tōa-pō·-hūn ê nńg-thé leh hián-sī chiah-ê jī-bó ê sî ē têng-tâⁿ--khì (pìⁿ-chòe lōan-má).

2-hiòng bûn-jī

Ū ê bûn-jī hē-thóng sī àn tó-pêng hiòng chiàⁿ-pêng siá, chhiūⁿ Latin bûn-jī. Ū ê sī àn chiàⁿ-pêng hiòng tó-pêng siá, chhiūⁿ Hebrew kap Arabic.

ku

Iōng hàn-jī ê '明' chit-ê jī chòe lē. Chit-ê jī it-poaⁿ kan-na iōng 1-ê jī-tô· lâi ìn-soat, ū tok-lip ê bé-ūi. M̄-koh, chit-ê jī mā ē-sái thiah-chòe 2-ê jī-tô·, hun-piat sī '日' kap '月'. Iōng chit 2-ê jī-tô· lâi ìn-soat, khó-lêng ē ìn-chhut chhiūⁿ '日月' án-ne ê tô· , Chin pháiⁿ-khòaⁿ.

M̄-koh, kā '明' thiah-chòe 2-ê jī-tô· lâi ìn-soat ū 1-ê hó--chhù, chit-ê hó--chhù sī: án-ne chò, su-iàu ê jī-tô· ē-sài kiám-chió. Tī 1-ê jī iōng 1-ê jī-tô· ê chêng-hêng, beh ìn '日','月','明', su-iàu 3-ê jī-tô·. Nā-sī '明' thiah chòe 2-ê jī-tô· ìn, kan-na su-iàu 2-ê jī-tô·. In-ùi jī-tô· ê sò·-bo̍k it-tèng sī iú-hān. Beh iōng chia-ê iú-hān ê jī-tô· lâi ìn-soat pí jī-tô· sò·-bo̍k koh-khah chē ê jī ê sî, su-iàu kā jī thiah-choè 1-ê í-siōng ê jī-tô· lâi ìn. Iā-tio̍h-sī iōng jī-tô· khì chó·-ha̍p (tàu) chhut sin-ê jī.

Beh iōng 2-ê í-siōng ê jī-tô· lâi chó·-ha̍p chhut 1-ê jī ê sî, Su iàu iōng 1-kóa ìn-soat ki-su̍t, nā-bô ìn ê jī ê chhiūⁿ '日月' án-ne chin pháiⁿ-khòaⁿ.

Beh liáu-kái thui-sak thong-iōng-bé chit-chióng pian-bé piau-chún ê tōng-ki, su-iàu seng liáu-káu siáⁿ-mi̍h sī pian-bé. Iōng Eng-gí chòe lē. Eng-gí su-iàu 26 ê tōa-siá ê jī (ABC...XYZ), 26 ê sió-siá ê jī (abc...xyz), Arabic sò·-jī (0123456789), kap 1-kóa piau-tiám (jī). Beh iōng tiān-náu chhú-lí Eng-gí, su-iàu 1-ê tùi-chiàu-pió, chit-ê pió ka múi 1-ê jī tùi-èng 1-ê to̍k-it ê 2-chìn-ūi sò·-jī. M̄-koh, tiòng-iàu ê sī, ta̍k-ê lâng lóng ài iōng kāng-khóan ê tùi-chiàu-pió. Án-ne ta̍k-ke chia ū hoat-tō ko·-thong, beh ka chia-ê 2-chìn-ūi sò·-jī hoan-e̍k tńg lâi chòe Eng-gí chiah bôe têng-tâⁿ.

Sǹg-sǹg iōng 7-bits to̍h ū-kàu. ASCII

1-ê pian-bé hē-thóng ē-sài tùi-èng kàu gōa-chē ê jī-tô· ài khòaⁿ chit-ê pian-bé iōng kui-ê bit lâi pó-chûn pian-bé-pió. 1-ê 7-bit ê 2-chìn-ūi sò·-jī tùi-èng ê hoàn-ûi sī àn 0 kàu 2^7-1=127(thak chòe 2 ê 7 chhù-hong). So-í, 1-ê 7-bit ê pian-bé ē-sài siōng-chē tùi-èng kàu 128 ê jī-tô·. Kāng-khoán ê tō-lí, 1-ê 8-bit ê pian-bé ē-sài tùi-èng kàu 256 ê jī-tô·. 1-ê 16 bit ê pian-bé ē-sài tùi-èng kàu ??? ê jī-tô·. Iōng lú-chē bit ê pian-bé ē-sài tùi-èng kàu lú-chē ê jī-tô·, m̄-koh, beh pó-chûn 1-ê jī su-iàu ê RAM mā lú-chē.

Chá-kî ê tiān-náu, RAM sī chin tin-kùi ê chu-goân. In-ùi án-ne, ta̍k-ke ē iōng sè ê pian-bé. Chhú-lí Eng-gí ê sī, sǹg-sǹg 7-bit ê pian-bé tio̍h ū-kàu. Che chò-sêng 7-bit ê ASCII pian-bé piau-chú. M̄-koh, kî-thaⁿ iōng lô-má-jī bûn-jī hē-thóng ê Europe gí-gian, chia-chia sū-iàu 1-koa ū ka phiat-im hū-ho ê jī, chhiūⁿ 'å', he̍k-chiá-sī 1-koá liân-jī, chhiūⁿ 'œ'. Chia-ê jī(jī-tô·) bô pau-koah tī ASCII pian-bé. Europe kok-ka, khai-sí chè-têng 8-bit ê pian-bé. Chia-ê 8-bit pian-bé, tùi 0 kàu 128 ê bé-ūi kap ASCII oân-choân sio-siâng. Āu

RENDERING

jī ==>jī-tô· ==>ìn


UTF-8

Bé-ūi ê hoàn-ûi
16 chìn-ūi
UTF-16 UTF-8
binary
Notes
000000 - 00007F 00000000 0xxxxxxx 0xxxxxxx ASCII equivalence range; byte begins with zero
000080 - 0007FF 00000xxx xxxxxxxx 110xxxxx 10xxxxxx first byte begins with 110 or 1110, the following byte(s) begin with 10
000800 - 00FFFF xxxxxxxx xxxxxxxx 1110xxxx 10xxxxxx 10xxxxxx
010000 - 10FFFF 110110xx xxxxxxxx
110111xx xxxxxxxx
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx UTF-16 requires surrogates; an offset of 0x10000 is subtracted, so the bit pattern is not identical with UTF-8

UTF-16

Pian-chi̍p tiong Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

UTF-32

Pian-chi̍p tiong Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

BiDi

Pian-chi̍p tiong Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Consortium

Pian-chi̍p tiong Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

UCS

Pian-chi̍p tiong Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Hàn-jī thóng-it (Han unification)

Pian-chi̍p tiong Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Jī-hêng

Pian-chi̍p tiong Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Pán-pún le̍k-sú

  • 1991 nî Unicode 1.0
  • 1993 nî Unicode 1.1
  • 1996 nî Unicode 2.0
  • 1998 nî Unicode 2.1
  • 1999 nî Unicode 3.0
  • 2001 nî Unicode 3.1
  • 2002 nî Unicode 3.2
  • 2003 nî Unicode 4.0
  • 2005 nî Unicode 4.1
  • 2006 nî Unicode 5.0

Pe̍h-ōe-jī kap Thong-iōng-bé

Chhiáⁿ chham-khó Taigi Unicode chit-phiⁿ bûn-chiuⁿ.

Goā-pō· liân-kiat

Pang-bô͘:Link FA