Talk:Unicode in Microsoft Windows
![]() | Computing: Software Start‑class Low‑importance | ||||||||||||
|
Untitled
Much of the last (utf-8) paragraph is babble. One does not require utf8 support from the OS when there is utf16 support, since the conversions between utf8 and utf16 are very simple and mechanical and do not require last tables (like other unicode functionality) 88.159.79.148 (talk) 17:39, 6 February 2016 (UTC)
- fopen("string",...) does not work and cannot open all possible files, due to the fact that utf-8 conversion is not done. This is a violation of the Posix and C-99 standard. Windows is broken, stop trying to claim otherwise. Yes you can work around it by converting the strings to UTF-16 and using Windows-specific api, but it is broken in that their standard c library does not do this.Spitzak (talk) 02:15, 9 February 2016 (UTC)
Yes, chcp 65001 is a thing
Assuming you can get your hands on Windows 10, grab a Ubuntu or any WSL system from the store. Run it, and you will see that conhost reports cp65001 in the window's properties.
WSL has a Binfmt_misc hook that lets the Win32 part run exe files, inheriting the WSL's many settings. One of these settings is the code page, and it causes bugs in old Python2 versions because Python2 does not know what the 65001 code page that Windows says it is using is.
If you read the workrounds in the bug, you will see that chcp 850
is used to switch to a encoding that Python2 understands, and chcp 65001
is used to switch it back after doing so. The full commands include /mnt/c/Windows/System32/cmd.exe /C
, because that's how you point to cmd under WSL.
And yes, you can reproduce that without WSL. Open up cmd in Windows 10 and install Python 3.6, and you can:
C:\Python\Python36>chcp 437 Active code page: 437 C:\Python\Python36>set PYTHONLEGACYWINDOWSSTDIO=1 C:\Python\Python36>python -c print(__import__('sys').stdout.encoding) cp437 C:\Python\Python36>chcp 65001 Active code page: 65001 C:\Python\Python36>python -c print(__import__('sys').stdout.encoding) cp65001
PYTHONLEGACYWINDOWSSTDIO is needed to force Python to use the local code page because of PEP-0528, which uses "utf-8" by default. Before setting the variable, Python 3.6 will always report "utf-8".
--Artoria2e5 contrib 16:24, 9 May 2018 (UTC)
- Regarding non-double-byte MBCSes: there is another four-byte-at-maximum code page in Windows called cp54936 (GB 18030). Like UTF-8, it too cannot be used for the locale or "ANSI" code page. In fact all the locale MBCS code pages are DBCS, so the likely explanation is that many programs simply cannot handle three or more bytes. --Artoria2e5 contrib 16:28, 9 May 2018 (UTC)