Jump to content

Duplicate characters in Unicode

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Plugwash (talk | contribs) at 21:36, 10 August 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Unicode has a certain ammount of duplication due to aiming to allow legacy encodings to be converted to unicode without losing any information.

CJK fullwidth forms

In traditional CJK encodings characters usually took either a single byte (known as halfwidth) or two bytes (known as fullwidth). Characters that took a single byte were generally displayed at half the width of those that took two bytes. Some characters such as the latin alphabet were availible in both halfwidth and fullwidth versions. As the halfwidth versions were more commonly used they were generally the ones mapped to the standard code points for those characters. Therefore a seperate section was needed for the fullwidth forms to preserve the distinction.

Greek

Many greek letters are used as technical symbols. All of the greek letters are encoded in the greek section of unicode but many are encoded a second time under the name of the technical symbol they represent. Of these micro sign is in the latin1 range and most of the rest are in the letterlike symbols range. Micro sign is obviously inherited from iso-8869-1 but the origins of the others is less clear.

Roman numerals

Unicode encodes the roman numerals in thier own section.