Code page 936 (Microsoft Windows)
Windows Code page 936 (CP936, MS936, Windows-936), is Microsoft's character encoding for simplified Chinese, one of the four DBCSs for East Asian languages. Originally, Windows-936 was identical to GB 2312 (in its EUC-CN form), but it was expanded to cover most of GBK with the release of Windows 95.
IBM's equivalent to this version is Code page 1386 (CP1386), which is defined as a combination of the single byte Code page 1114 and the double byte Code page 1385. IBM's Code page 936 is a different variant of EUC-CN.
It was superseded by code page 54936 (GB 18030), but as of 2014[update] was still prevalent in use. The Windows command prompt uses CP936 as the default code page for simplified Chinese installations, although part of the GB 18030 was made mandatory for all software products sold in China. In 2002, the IANA Internet name GBK was registered with Windows-936's mapping,[1][2] making it the de facto GBK definition on the Internet.
The concepts of "Windows-936", "GBK"[a], "GB2312" and "EUC-CN" are sometimes confused in various software products. Code page 1386 is not identical to GBK because a code page encodes characters while the GBK only defines code points. In addition, the Euro sign (€), encoded as 0x80 in CP1386, is not defined in GBK. On the other hand, 95 characters defined in GBK were initially not encoded into CP1386.
This is partly resolved in later versions of Windows and, as in Windows 7, all GBK characters not in the Unicode BMP Private Use Area can be displayed using code page 1386, but encoding the 95 characters was still not supported as of 2014[update]. However, "CP936" and "GBK" are often used interchangeably because of the popularity of Microsoft products on the Chinese market when GBK was then published.
Since GBK superseded GB2312 long ago, these two terms have also become virtually equivalent to many users, so "Windows-936", "GBK" and "GB2312" are misunderstood by many to mean the same thing while they actually differ significantly. Instead of supporting precisely GB2312, most modern-day software products mean partial support for GBK using CP1386 when they use the term "GB2312" as a character encoding option. This can be observed in products such as Microsoft Internet Explorer and Notepad++.
Notes
- ^ GBK 1.0
References
- ^ "Character Sets". Retrieved 3 October 2016.
- ^ [https://www.iana.org/assignments/charset-reg/GBK Application of IANA Charset Registration for GBK]
External links
- Microsoft's reference for Windows-936
- Code page file for Windows-936
- Mapping of Windows-936 to Unicode
- ICU demonstration of Windows-936
- ICU's Authoritative GBK mapping - part of GB18030 data