Jump to content

Unicode in Microsoft Windows

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Incnis Mrsi (talk | contribs) at 17:00, 28 June 2011 (the article largely based on a translation of corresponding article in ru.WP, ru:Юникод в операционных системах семейства Microsoft Windows). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Microsoft started to consistently implement Unicode in their products quite early. Windows NT was the first operating system that used Unicode in system calls. Using at first UCS-2 encoding scheme, it was upgraded to UTF-16 starting with Windows 2000, allowing a representation of additional planes with surrogate pairs.

Modern systems

Modern operating systems Windows XP and Windows Server 2003, and prior to them as Windows NT 4 and Windows 2000 are shipped with the system libraries, which supported string encoding of both types: Unicode and current code page, still incorrectly referred to as ANSI code page. Unicode functions have the suffix -W (from the word "wide"), for example, lstrlenW(). Code page functions uses suffix -A, e.g., lstrlenA(). This allows Windows NT OS family simultaneously run programs capable of using Unicode, and older, 8-bit encoding programs. Most of such ANSI-functions are implemented as a wrapper over the corresponding Unicode functions.

The IsTextUnicode function uses an heuristic algorithm on a byte string passed to it to detect whether this string represents an Unicode text. For very short texts, this function, used by some applicatrion like Notepad, often gives incorrect results. This gave rise to legends about the existence of "Easter eggs" like Bush hid the facts.

Windows CE

In Windows CE UTF-16 was used almost exclusively.

Windows 9x

In 2001, Microsoft Layer for Unicode was released, a special supplement to Microsoft’s old Windows 9x systems. This supplement includes a dynamic link library unicows.dll (only 240 KB) containing the Unicode flavor (the ones with the letter W on the end) of all the basic functions of Windows API.