Error-correcting code

In information theory and coding, an error-correcting code or ECC is a code in which each data signal conforms to specific rules of construction so that departures from this construction in the received signal can generally be automatically detected and corrected. It is used in computer data storage, for example in dynamic RAM, and in data transmission. Examples include Hamming code, Reed-Solomon code, Reed-Muller code, Binary Golay code, convolutional code, turbo code and others. The simplest error correcting codes can correct single-bit errors (single error correction) and detect double-bit errors (double error detection). Other codes can detect or correct multi-bit errors.

Shannon's theorem is an important theory in error correction which describes the maximum attainable efficiency of an error-correcting scheme versus the levels of noise interference expected.

Principle

In general these methods try to induce a extra redundancy into the data stream that follows certain algebraic or geometric relations, so that the decoded stream if erroneous, can be corrected with the help of the error protection / redundancy used in the scheme.

In essence the advantage of the coding scheme is measured in terms of the Coding gain which is the measure in the difference between the SNR levels between the uncoded system & coded system required to reach the same BER levels.

Information theory and error correction and detection

Information theory tells us that whatever the probability of error in transmission or storage, it is possible to construct error-correcting codes in which the likelihood of failure is arbitrarily low, although this requires adding increasing amounts of redundant data to the original, which might not be practical when the error probability is very high. Shannon's theorem sets an upper bound to the error correction rate that can be achieved (and thus the level of noise that can be tolerated) using a fixed amount of redundancy, but does not tell us how to construct such an optimal encoder.

Types

Error-correcting codes can be divided into block codes and convolutional codes. Other block error-correcting codes, such as Reed-Solomon codes transform a chunk of bits into a (longer) chunk of bits in such a way that errors up to some threshold in each block can be detected and corrected.

Burst Errors

However, in practice errors often occur in bursts rather than at random. This is often compensated for by shuffling (interleaving) the bits in the message after coding. Then any burst of bit-errors is broken up into a set of scattered single-bit errors when the bits of the message are unshuffled (de-interleaved) before being decoded.

Bounded Distance Decoding (BDD)

In this scheme, the number of errors is less than or equal to the maximum correctable threshold of the code, all errors will be corrected.

Error-correcting codes require more signal elements than are necessary to convey the basic information.
The two main classes of error-correcting codes are block codes and convolutional codes.

Typical schemes

Several schemes exist to achieve error detection, and are generally quite simple.

Repetition schemes

Variations on this theme exist. Given a stream of data that is to be sent, the data is broken up into blocks of bits, and in sending, each block is sent some predetermined number of times. For example, if we want to send "1011", we may repeat this block three times each.

Suppose we send "1011 1011 1011", and this is received as "1010 1011 1011". As one group is not the same as the other two, we can determine that an error has occurred. This scheme is not very efficient, and can be susceptible to problems if the error occurs in exactly the same place for each group (e.g. "1010 1010 1010" in the example above will be detected as correct in this scheme).

The scheme however is extremely simple, and is in fact used in some transmissions of numbers stations.

Parity schemes

Main article: Parity bit

Given a stream of data that is to be sent, the data is broken up into blocks of bits, and the number of 1 bits is counted. Then, a "parity bit" near the block is set or cleared if the number of one bits is odd or even. If the tested blocks overlap, then the parity bits can be used to isolate the error, and even correct it if the error is isolated to one bit: this is the principle of the Hamming code.

There is a limitation to parity schemes. A parity bit is only guaranteed to detect an odd number of bit errors (one, three, five, and so on). If an even number of bits (two, four, six and so on) have an error, the parity bit records the correct number of ones, even though the data is corrupt.

Cyclic redundancy checks

Main article: Cyclic redundancy check

Many more complex error detection (and correction) methods make use of the properties of finite fields and polynomials over such fields.

The cyclic redundancy check considers a block of data as the coefficients to a polynomial and then divides by a fixed, predetermined polynomial. The coefficients of the result of the division is taken as the redundant data bits, the CRC.

Checking the received data can be achieved by multiplying the predetermined polynomial by the CRC.
If this is the same as the payload data, then the data has been received without error.
Alternatively, one can recompute the CRC from the payload bits and compare the CRC with the CRC that has been received.

Hamming distance based checks

If we want to detect d bit errors in an n bit word we can map every n bit word into a bigger n+d+1 bit word so that the minimum Hamming distance between each valid mapping is d+1 This way, if one receives a n+d+1 word that doesn't match any word in the mapping (with a Hamming distace x <= d+1 from any word in the mapping) it can successfully detect it as an errored word. Even more, d or less errors will never transform a valid word into another, because the Hamming distance between each valid word is at least d+1, and such errors only lead to invalid words that are detected correctly. Given a stream of m*n bits, we can detect x <= d bit errors successfully using the above method on every n bit word. In fact, we can detect a maximum of m*d errors if every n word is transimtted with maximum d errors.

Error correction

The above methods are sufficient to determine whether some data has been received in error. But often, this is not enough. Consider an application such as simplex teletype over radio (SITOR). If a message needs to be received quickly and needs to be completely without error, merely knowing where the errors occurred may not be enough, the second condition is not satisfied as the message will be incomplete. Suppose then the receiver waits for a message to be repeated (since the situation is simplex), the first condition is not satisfied since the receiver will have to wait (possibly a long time) for the message to be repeated to fill the gaps left by the errors.

It would be advantageous if the receiver could somehow determine what the error was and thus correct it. Is this even possible? Yes, consider the NATO phonetic alphabet -- if a sender were to be sending the word "WIKI" with the alphabet by sending "WHISKEY INDIA KILO INDIA" and this was received (with * signifying letters received in error) as "W***KEY I**I* **LO **DI*", it would be possible to correct all the errors here since there is only one word in the NATO phonetic alphabet which starts with "W" and ends in "KEY", and similarly for the other words. This idea is also present in some error correcting codes (ECC).

Error-correcting schemes also have their limitations. Some can correct a certain number of bit errors and only detect further numbers of bit errors. Codes which can correct one error are termed single error correcting (SEC), and those which detect two are termed double error detecting (DED). There are codes which can correct and detect more errors than these.

Applications

The Internet

In a typical TCP/IP stack, error detection is performed at multiple levels:

Each Ethernet frame carries a CRC-32 checksum. The receiver discards frames if their checksums don't match.
The IPv4 header contains a header checksum of the contents of the header (excluding the checksum field). Packets with checksums that don't match are discarded.
The checksum was omitted from the IPv6 header, because most current link layer protocols have error detection.
UDP has an optional checksum. Packets with wrong checksums are discarded.
TCP has a checksum of the payload, TCP header (excluding the checksum field) and source- and destination addresses of the IP header. Packets found to have incorrect checksums are discarded and eventually get retransmitted when the sender receives a triple-ack or a time-out occurs.

Deep Space Telecommunications

NASA has used many different error correcting codes. For missions between 1969 and 1977 the Mariner spacecraft used a Reed-Muller code. The noise these spacecraft were subject to was well approximated by a "bell-curve" (normal distribution), so the Reed-Muller codes were well suited to the situation.

The Voyager 1 & Voyager 2 spacecraft transmitted color pictures of Jupiter and Saturn in 1979 and 1980.

Color image transmission required 3 times the amount of data, so the Golay (24,12,8) code was used.
This Golay code is only 3-error correcting, but it could be transmitted at a much higher data rate.
Voyager 2 went on to Uranus and Neptune and the code was switched to a concatenated Reed-Solomon code-Convolutional code for its substantially more powerful error correcting capabilities.
Current DSN error correction is done with dedicated hardware.
For some NASA deep space craft such as those in the Voyager program, Cassini-Huygens (Saturn), New Horizons (Pluto) and Deep Space 1 -- the use of hardware ECC may not be feasible for the full duration of the mission.

File:NASA ECC Codes-imperfection.png

NASA's Deep Space Missions ECC Codes (code imperfectness)

The different kinds of deep space and orbital missions that are conducted suggest that trying to find a "one size fits all" error correction system will be an ongoing problem for some time to come.

For missions close to the earth the nature of the "noise" is different from that on a spacecraft headed towards the outer planets
In particular, if a transmitter on a spacecraft far from earth is operating at a low power, the problem of correcting for noise gets larger with distance from the earth

Satellite Broadcasting (DVB)

The demand for satellite transponder bandwidth continues to grow, fueled by the desire to deliver television (including new channels and High Definition TV) and IP data. Transponder availability and bandwidth constraints have limited this growth, because transponder capacity is determined by the selected modulation scheme and Forward Error Correction (FEC) rate.

Scientific-Atlanta (now part of Cisco Systems) has been evaluating developing products based on Turbo Codes concatenated with minimal complexity Reed-Solomon Codes in its laboratories in Atlanta, Georgia and Toronto, Canada.

Overview

QPSK coupled with traditional Reed Solomon and Viterbi codes have been used for nearly 20 years for the delivery of digital satellite TV.
Higher order modulation schemes such as 8PSK, 16QAM and 32QAM have enabled the satellite industry to increase transponder efficiency by several orders of magnitude.
This increase in the information rate in a transponder comes at the expense of an increase in the carrier power to meet the threshold requirement for existing antennas.
Tests conducted using the latest chipsets demonstrate that the performance achieved by using Turbo Codes may be even lower than the 0.8 dB figure assumed in early designs.

File:Block-ECC-Codes 2D 3D types.png

Block 2D & 3D bit allocation models used by ECC coding systems in terrestrial telecommunications

List of error-correction, error-detection methods

Check bit
Check digit
Chipkill, an application of ECC techniques to volatile system memory.
Convolutional codes are usually decoded with Iterative Viterbi Decoding techniques
Differential space–time codes, related to space–time block codes.
Erasure codes are a superset of Fountain codes
Forward error correction
Group code
Golay code, the Binary Golay codes are the most commonly used Golay codes
Goppa code that is used to create the McEliece cryptosystem
Hagelbarger code
Hamming code
Longitudinal redundancy check
Low-density parity-check code
LT codes are near optimal rateless erasure correcting codes.
Online codes are an example of rateless erasure codes.
Parity bit
Raptor codes are high speed (near real time) fountain codes.
Reed-Solomon error correction
Reed-Muller code
Sparse graph code
Space–time code
Space–time trellis code
Tornado codes are optimal Fountain codes
Turbo code
Viterbi algorithm
Walsh code used in cellular telephony for its high noise immunity, not just its ECC capabilities

Sources

External links

The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, contains chapters on elementary error-correcting codes; on the theoretical limits of error-correction; and on the latest state-of-the-art error-correcting codes, including low-density parity-check codes, turbo codes, and fountain codes.
Article: Memory errors and SECDED