Jump to content

Unicode character property

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by DePiep (talk | contribs) at 17:44, 13 May 2010 (Created page with ''''Unicode''' assigns '''character properties''' to each codepoint.<ref name="Chapter4">[http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf Unicode 5.2 chapter 4...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Unicode assigns character properties to each codepoint.[1] The properties can be used to handle characters in processing texts, like line-breaking, script direction right-to-left, or script naming. Slightly inconsequently, some character properties are also defined for codepoints that have no character assigned, and codepoints that are defined "not a character", etc.

Properties has level of forcefulness: normative, informative, contributory, or provisional. Technically a property may be assigned by specifying a range of codepoints.

The character properties are in these topics[1]:

  • Name
  • General Category
  • Other important general characteristics
  • Display-related properties (bidirectional class, shaping, mirroring, width, and so on)
  • Casing (upper, lower, title, folding—both simple and full)
  • Numeric values and types
  • Script and Block
  • Normalization properties (decompositions, decomposition type, canonical combining class, composition exclusions, and so on)
  • Age (version of the standard in which the code point was first designated)
  • Boundaries (grapheme cluster, word, line, and sentence)

General Category

Each codepoint is assigned a value as for its General Category. This is one of the character properties that are also defined unassigned codepoints, and codepoints that are defined "not a character".

Template:Unicode property General Category

References