Code page 936 (IBM)
Alias(es) | SHIFTGB[1] |
---|---|
Language(s) | Simplified Chinese |
Created by | IBM |
Current status | Deprecated |
Transforms / Encodes | GB 2312 |
Succeeded by | IBM-1381 |
Other related encoding(s) | Shift JIS |
IBM code page 936 was a character encoding for Simplified Chinese including 1880 user-defined characters (UDC). It was a combination of the single-byte Code page 903 and the double-byte Code page 928.[2] Code page 946 used the same double-byte component, but an extended single-byte component (Code page 1042).[3]
IBM code page 936 should not be confused with the identically numbered Windows code page, which is a variant of the GBK encoding; GBK is called Code page 1386 by IBM. While GBK is a superset of the EUC-CN encoding of GB 2312, IBM-936 uses a different coded form of GB 2312, more closely resembling the relationship of Shift JIS to JIS X 0208.
Status

The last revision of IBM-928/936/946 was documented in 1992, and it was superseded in 1993 by the EUC-CN-based code pages 1380 through 1383; code page 1380 encodes the same characters as code page 928, but in a different layout.[4] Although chart definitions for Code page 1380 (the document C-H 3-3220-130 1993-11) are provided online by IBM, IBM does not similarly provide the chart definition for the older Code page 928 (the document C-H 3-3220-130 1992-11, i.e. an earlier revision of the same specification).[4][5]
International Components for Unicode (ICU) does not include an IBM-936 or IBM-946 codec, and uses the Windows code page for the "cp936" label.[6] The ICU project does possess mapping data for IBM-946,[7] but does not ship it with ICU.
Structure
Code page 928, the double byte component, included 9,355 characters as double-byte sequences starting with 0x81 through 0xAC and 0xF0 through 0xFA.[8]
The 0x81–AC lead byte range was used for GB 2312 characters: lead bytes 0x81–87 were used for non-hanzi, 0x88–9C were used for level 1 hanzi and 0x9C–AC were used for level 2 hanzi.[1][4][7] Like Shift JIS, trail (second) bytes were in the range 0x40–FC excluding 0x7F, allowing two GB 2312 rows to be encoded per lead byte;[7] unlike Shift JIS, the bytes 0xA0–AC were not excluded from the lead byte range,[4][7] since JIS X 0201 compatibility was not required. The 0xF0–FA lead byte range was used for IBM extensions: 0xF0 through 0xF9 were used for user-defined characters, and 0xFA was used for additional non-hanzi.[4]
References
- ^ a b Leisher, Mark (2008) [1998-03-06]. "Shifted GB2312.1980. Generated from an algorithm provided with some older Chinese packages". Department of Mathematical Sciences, New Mexico State University.
- ^ "CCSID 936". IBM. Archived from the original on 2016-03-27.
- ^ "CCSID 946". IBM. Archived from the original on 2016-03-26.
- ^ a b c d e "Table 1: Registration of GCSGID and CPGID for the IBM CH-S Graphic Character Set". C-H 3-3220-130 1993-11: IBM Simplified Chinese Graphic Character Set (PDF). 1993. p. 6.
- ^ "Code page 928 information document". Archived from the original on 2016-03-17.
- ^ "windows-936-2000 (alias cp936)". ICU Demonstration - Converter Explorer. International Components for Unicode.
- ^ a b c d "ibm-946_P100-1995". International Components for Unicode Data Repository. Unicode Consortium, IBM.
- ^ "CCSID 928 information document". Archived from the original on 2016-03-26.