Jump to content

Code page 936 (IBM)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by HarJIT (talk | contribs) at 12:51, 20 January 2023 (Cite the "shift GB" name. This means both documents listed in ext links are now used as references, so remove the ext links section.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
IBM-936
Alias(es)SHIFTGB[1]
Language(s)Simplified Chinese
Created byIBM
Current statusDeprecated
Transforms / EncodesGB 2312
Succeeded byIBM-1381
Other related encoding(s)Shift JIS

IBM code page 936 was a character encoding for Simplified Chinese including 1880 UDC. It was a combination of the single-byte Code page 903 and the double-byte Code page 928.[2] Code page 946 used the same double-byte component, but an extended single-byte component (Code page 1042).[3]

IBM code page 936 should not be confused with the identically numbered Windows code page, which is a variant of the GBK encoding; GBK is called Code page 1386 by IBM. While GBK is a superset of the EUC-CN encoding of GB 2312, IBM-936 uses a different coded form of GB 2312, more closely resembling the relationship of Shift JIS to JIS X 0208.

Status

Except for Shift JIS itself, the similarly structured code pages for other CJK locales were phased out between 1992 and 2016.

The last revision of IBM-928/936/946 was documented in 1992, and it was superseded in 1993 by the EUC-CN-based code pages 1380 through 1383; code page 1380 encodes the same characters as code page 928, but in a different layout.[4] Although chart definitions for Code page 1380 (the document C-H 3-3220-130 1993-11) are provided online by IBM, IBM does not similarly provide the chart definition for the older Code page 928 (the document C-H 3-3220-130 1992-11, i.e. an earlier revision of the same specification).[4][5]

International Components for Unicode (ICU) does not include an IBM-936 or IBM-946 codec, and uses the Windows code page for the "cp936" label.[6] The ICU project does possess mapping data for IBM-946,[7] but does not ship it with ICU.

Structure

Code page 928, the double byte component, included 9,355 characters as double-byte sequences starting with 0x81 through 0xAC and 0xF0 through 0xFA.[8] The 0x81–AC lead byte range was used for GB 2312 characters: lead bytes 0x81–87 were used for non-hanzi, 0x88–9C were used for level 1 hanzi and 0x9C–AC were used for level 2 hanzi.[4] The 0xF0–FA lead byte range was used for IBM extensions: 0xF0 through 0xF9 were used for user-defined characters, and 0xFA as used for additional non-hanzi.[4] Like Shift JIS, trail (second) bytes were in the range 0x40–FC excluding 0x7F, allowing two GB 2312 rows to be encoded per lead byte.[7]

References

  1. ^ Leisher, Mark (1998-03-06). Computing Research Labs, New Mexico State University http://sofia.nmsu.edu/~mleisher/Software/csets/SHIFTGB.TXT. {{cite web}}: Missing or empty |title= (help)
  2. ^ "CCSID 936". IBM. Archived from the original on 2016-03-27.
  3. ^ "CCSID 946". IBM. Archived from the original on 2016-03-26.
  4. ^ a b c d "Table 1: Registration of GCSGID and CPGID for the IBM CH-S Graphic Character Set". C-H 3-3220-130 1993-11: IBM Simplified Chinese Graphic Character Set (PDF). 1993. p. 6.
  5. ^ "Code page 928 information document". Archived from the original on 2016-03-17.
  6. ^ "windows-936-2000 (alias cp936)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  7. ^ a b "ibm-946_P100-1995". International Components for Unicode Data Repository. Unicode Consortium, IBM.
  8. ^ "CCSID 928 information document". Archived from the original on 2016-03-26.