Jump to content

Complex script

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by STBot (talk | contribs) at 09:11, 15 October 2006 (clean up - bother me at my talk if I mess up using AWB). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
The word العربية al'arabiyyah, "the Arabic language" in Arabic, in stages of rendering. The first line shows the letters as they are unprocessed, the result that would be given by an application without complex script rendering. In the second line the bidirectional display mechanism has come to play, and in the third the glyph shaping mechanism has rendered the letters according to context.

A complex script is a human-language script in which the relationship between characters and glyphs, or their display direction, or both, are out of the bounds of what has been traditionally assumed from experience of dealing with ASCII characters. That is, a complex script is that which breaks the assumptions of one-to-one correspondence between characters and glyph and a single display direction.

It follows that the Indic, Hebrew and Arabic scripts are complex scripts, as they break one or both of the assumptions--in Indic a character can have numerous different glyphs depending on its context (the use of virama with a following character triggers the half-form, or conjuct form, of the first character), in Hebrew the text can run both right to left (Hebrew letters) and left to right (digits) on the same line, and in Arabic both complexities occur. In contrast, Chinese is not a complex script--although its huge character repertoire requires special handling in itself, it does not break the traditional ASCII assumptions of character to glyph correspondence and single directionality on the same line.

Complex scripts cannot be displayed properly without a display mechanism dedicated to their specialties. This means that merely extending the character repertoire is not enough for enabling them on a system. For example, old text terminals, if they are 8-bit clean, can be extended to basic Unicode support via UTF-8 (which was originally designed prominently for accommodation of legacy environments), allowing them to display characters in Greek, Runic and Chinese, but displaying Arabic characters would necessitate upgrading the terminal display software itself, which is no small undertaking. Because of this initial difficulty of providing support for complex scripts, many multilingual packages boasting of support for most of the world's languages can be found to be missing support for all of the Arabic, Hebrew and Indic scripts.

The issue of Far Eastern display direction does not make such scripts as Chinese or Mongolian complex. Unlike Hebrew or Arabic, they can be written left to right, top to bottom instead of their default (pre-computer) direction without affecting the meaning, and even in their default direction there are no directionality changes within a single line, so that the unusual direction can be implemented with a simple instruction to the display mechanism. For Hebrew text such an instruction would cause the letters to be displayed correctly but the digits (and English text) in reverse order.

Complex layout engines may be integrated in many products with embedded systems like cell phones and electronic machine equipments. Complex layout engines as software components are available from Monotype Imaging, Bitstream and DCC (in German).