Unicode character property

Unicode assigns character properties to each codepoint.^[1] The properties can be used to handle characters in processing texts, like line-breaking, script direction right-to-left, or script naming. Slightly inconsequently, some character properties are also defined for codepoints that have no character assigned, and codepoints that are defined "not a character", etc.

Properties has level of forcefulness: normative, informative, contributory, or provisional. Technically a property may be assigned by specifying a range of codepoints.

The character properties are in these topics^[1]:

Name
General Category
Other important general characteristics
Display-related properties (bidirectional class, shaping, mirroring, width, and so on)
Casing (upper, lower, title, folding—both simple and full)
Numeric values and types
Script and Block
Normalization properties (decompositions, decomposition type, canonical combining class, composition exclusions, and so on)
Age (version of the standard in which the code point was first designated)
Boundaries (grapheme cluster, word, line, and sentence)

General Category

Each codepoint is assigned a value as for its General Category. This is one of the character properties that are also defined unassigned codepoints, and codepoints that are defined "not a character".

Template:Unicode property General Category

References

^ ^a ^b Unicode 5.2 chapter 4

This software article is a stub. You can help Wikipedia by expanding it.

[Chapter4-1] Unicode 5.2 chapter 4

[1]