Code page 936 (IBM)
Alias(es) | Shift_GB |
---|---|
Language(s) | Simplified Chinese |
Created by | IBM |
Current status | Deprecated |
Transforms / Encodes | GB 2312 |
Succeeded by | IBM-1381 |
Other related encoding(s) | Shift JIS |
IBM code page 936 was a character encoding for Simplified Chinese including 1880 UDC. It was a combination of the single-byte Code page 903 and the double-byte Code page 928.[1] Code page 946 used the same double-byte component, but an extended single-byte component (Code page 1042).[2]
IBM code page 936 should not be confused with the identically numbered Windows code page, which is a variant of the GBK encoding; GBK is called Code page 1386 by IBM. While GBK is a superset of the EUC-CN encoding of GB 2312, IBM-936 uses a different coded form of GB 2312, more closely resembling Shift JIS.
Status
The last revision of IBM-928/936/946 was documented in 1992, and it was superseded in 1993 by the EUC-CN-based code pages 1380 through 1383; code page 1380 encodes the same characters as code page 928, but in a different layout.[3] Although chart definitions for Code page 1380 (the document C-H 3-3220-130 1993-11) are provided online by IBM, IBM does not similarly provide the chart definition for the older Code page 928 (the document C-H 3-3220-130 1992-11, i.e. an earlier revision of the same specification) and suggests contacting them for more information.[3][4]
International Components for Unicode (ICU) does not include an IBM-936 or IBM-946 codec, and uses the Windows code page for the "cp936" label.[5] The ICU project does possess mapping data for IBM-946,[6] but does not ship it with ICU.
Structure
Code page 928, the double byte component, included 9,355 characters as double-byte sequences starting with 0x81 through 0xAC and 0xF0 through 0xFA.[7] The 0x81–AC lead byte range was used for GB 2312 characters: lead bytes 0x81–87 were used for non-hanzi, 0x88–9C were used for level 1 hanzi and 0x9C–AC were used for level 2 hanzi.[3] The 0xF0–FA lead byte range was used for IBM extensions: 0xF0 through 0xF9 were used for user-defined characters, and 0xFA as used for additional non-hanzi.[3] Like Shift JIS, trail (second) bytes were in the range 0x40–FC excluding 0x7F, allowing two GB 2312 rows to be encoded per lead byte.[6]
References
- ^ "CCSID 936". IBM. Archived from the original on 2016-03-27.
- ^ "CCSID 946". IBM. Archived from the original on 2016-03-26.
- ^ a b c d "Table 1: Registration of GCSGID and CPGID for the IBM CH-S Graphic Character Set". C-H 3-3220-130 1993-11: IBM Simplified Chinese Graphic Character Set (PDF). 1993. p. 6.
- ^ "Code page 928 information document". Archived from the original on 2016-03-17.
- ^ "windows-936-2000 (alias cp936)". ICU Demonstration - Converter Explorer. International Components for Unicode.
- ^ a b "ibm-946_P100-1995". International Components for Unicode Data Repository. Unicode Consortium, IBM.
- ^ "CCSID 928 information document". Archived from the original on 2016-03-26.
External links
- GB2312: Comparison of conversion tables: the file
csets-1.7/SHIFTGB.TXT
("Shifted GB2312.1980. Generated from an algorithm provided with some older Chinese packages.") amounts to code page 936 without user-defined or IBM-specific inclusions - Mapping for code page 946 (superset of code page 936)