Jump to content

Binary Ordered Compression for Unicode

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by UTF-8 (talk | contribs) at 02:13, 28 February 2005 (Initial version). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Template:Table Unicode BOCU-1 is a MIME compatible Unicode compression scheme. BOCU stands for Binary Ordered Compression for Unicode. BOCU-1 combines the wide applicability of UTF-8 with the compactness of SCSU. This Unicode encoding is useful for compressing short strings, and it maintains code point order. Usually, the zip, bzip2 and other industry standard algorithms compact larger amounts of Unicode text more efficiently.

SCSU was created as a Unicode compression scheme with a byte/code point ratio similar to language-specific codepages. It has not been widely adopted although it fulfills the criteria for an IANA charset and is registered with IANA. SCSU is not suitable for MIME “text” media types. For example, SCSU cannot be used directly in emails and similar protocols. SCSU requires a complicated encoder design for good performance.

BOCU-1 has not been officially adopted by the Unicode consortium, but Unicode Technical Note #6 does describe this encoding in more detail. On the other hand, SCSU has been adopted as an official Unicode Technical Standard.