Jump to content

Draft:Digital alphabet

From Wikipedia, the free encyclopedia


Digital alphabet is a term used in information theory, telecommunication, and computer science for any finite, discrete set of symbols chosen to represent information in a machine-readable form. Typical examples range from the binary alphabet {0, 1} that underpins modern electronics to large character repertoires such as Unicode. Digital alphabets make it possible to encode, transmit, store, and decode messages reliably because every symbol is unambiguously distinguishable from every other.[1][2]

Definition and scope

[edit]

A digital alphabet can be formalised as a finite alphabet used by a discrete source.Claude Shannon’s 1948 paper defined a noiseless source as one that “chooses successively from a set of symbols” that constitute an alphabet.[3]

Although the binary alphabet of two symbols is the simplest and today the most widespread, historical and contemporary systems employ alphabets of many sizes—for example the 5-bit Baudot code’s 32 symbols, ASCII’s original 128, or Unicode’s >149 000 characters.[4][5][6]

Historical development

[edit]

Early telegraphy

[edit]
  • Baudot code (1870 – 1874) replaced variable-length Morse symbols with fixed-length five-unit patterns, laying the groundwork for later digital codes.[7][8]
  • The Baudot family evolved into the International Telegraph Alphabet No. 2 (ITA 2) and remained in telex use until the 1960s.[9]

ASCII and the early computer era

[edit]

In 1963 the American Standards Association adopted ASCII, a 7-bit, 128-symbol digital alphabet designed for English-language data exchange between computers and peripherals.[10] ASCII’s fixed length and simple parity bit made it attractive for early serial links, but it proved inadequate for multilingual computing.[5]

Unicode and universal character sets

[edit]

Unicode (first published 1991) provides a single digital alphabet intended to encode every writing system. As of version 16.0 it contains more than 149 000 characters across 168 scripts, plus numerous symbols and emojis.[11] Variable-length UTF-8, UTF-16, and UTF-32 encodings preserve backward compatibility with ASCII while permitting much larger alphabets.[6]

Theoretical properties

[edit]

Alphabet size and information content

[edit]

Shannon showed that the maximum information conveyed per symbol (entropy H) depends on both symbol probabilities and alphabet size. For a binary alphabet the maximum entropy is 1 bit per symbol; generalising to an n-ary alphabet yields bits per perfectly random symbol.[12]

Binary alphabet

[edit]

Research highlights the “curious case of the binary alphabet”: certain coding-theory and privacy results that hold for take different forms when .[13]

Redundancy and error control

[edit]

By adding check symbols from the same alphabet (parity bits, CRCs), a code can detect or correct errors introduced in a noisy channel, trading redundancy for reliability—an idea central to modern channel coding.[12]

Applications

[edit]
Domain Role of the digital alphabet Typical alphabet Example
Telecommunications Serial-line encoding, character framing ASCII, ITA 2 Telex, RS-232
Data storage Magneto-electric patterns for bytes Binary 8-bit SSD, HDD
Internet protocols Packet payload text UTF-8 (Unicode) HTTP headers, JSON
Optical & radio links Modulation symbols (QAM, PSK) Binary, quaternary, 256-QAM symbols Wi-Fi, LTE
Synthetic biology Expanded DNA alphabets for data storage or biotechnology A,C,G,T + artificial bases (e.g. P,Z) Six-letter DNA aptamers[14]

Relation to other concepts

[edit]

See also

[edit]

References

[edit]
  1. ^ Selbsterklärende Codes: Papier und Digitale Codierung (Report). Universität Heidelberg. 2016. Retrieved 2025-05-13.
  2. ^ Cairncross, Frances (2001). The Death of Distance 2.0. Harvard Business School Press. ISBN 978-1-591-39098-6. {{cite book}}: Check |isbn= value: checksum (help)
  3. ^ Shannon, Claude E. (1948). "A Mathematical Theory of Communication". Bell System Technical Journal. 27 (3): 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.
  4. ^ "Baudot Code". Encyclopædia Britannica (online). 2025. Retrieved 2025-05-13.
  5. ^ a b "American Standard Code for Information Interchange (ASCII)". Investopedia. 2024-02-15. Retrieved 2025-05-13.
  6. ^ a b "The Unicode Standard – Technical Introduction". Unicode Consortium. 2023-09-12. Retrieved 2025-05-13.
  7. ^ "Émile Baudot Invents the Baudot Code". History of Information. 2024-04-11. Retrieved 2025-05-13.
  8. ^ Weisberger, Richard (2017-09-07). "The Roots of Computer Code Lie in Telegraph Code". Smithsonian Magazine. Retrieved 2025-05-13.
  9. ^ "International Telegraph Alphabet No. 2 (ITA 2)". International Telecommunication Union. Retrieved 2025-05-13.
  10. ^ "Breaking the Language Barrier". Wired. 1993-10-15. Retrieved 2025-05-13.
  11. ^ "Emoji Counts v16.0". Unicode Consortium. 2024-09-12. Retrieved 2025-05-13.
  12. ^ a b "Information Theory Lecture Notes" (PDF). University of Auckland. 2004. Retrieved 2025-05-13.
  13. ^ Jiao, Jiantao; et al. (2015). "Information Measures: The Curious Case of the Binary Alphabet". IEEE Transactions on Information Theory. 61 (2): 779–800. arXiv:1401.6060. doi:10.1109/TIT.2014.2368555.
  14. ^ "Chemists Invent New Letters for Nature's Genetic Alphabet". Wired. 2015-04-07. Retrieved 2025-05-13.