Sardinas–Patterson algorithm

In coding theory, the Sardinas–Patterson algorithm is a classical algorithm for determining whether a given variable-length code is uniquely decodable. The algorithm carries out a systematic search for a string which admits two different decompositions into codewords. As Knuth reports, the algorithm was rediscovered about ten years later in 1963 by Floyd, despite the fact that it was at the time already well known in coding theory.^[1]

Idea of the algorithm

Consider the code $\{\,a\mapsto 1,b\mapsto 011,c\mapsto 01110,d\mapsto 1110,e\mapsto 10011\,\}$ . This code, which is based on an example by Berstel,^[2] is an example of a code which is not uniquely decodable, since the string

011101110011

can be interpreted as the sequence of codewords

01110 – 1110 – 011,

but also as the sequence of codewords

011 – 1 – 011 – 10011.

Two possible decodings of this encoded string are thus given by cdb and babe.

In general, a codeword can be found by the following idea: In the first round, we choose two codewords $x_{1}$ and $y_{1}$ such that $x_{1}$ is a prefix of $y_{1}$ , that is, $x_{1}w=y_{1}$ for some "dangling suffix" $w$ . If one tries first $x_{1}=011$ and $y_{1}=01110$ , the dangling suffix is $w=10$ . If we manage to find two sequences $x_{2},\ldots ,x_{p}$ and $y_{2},\ldots ,y_{q}$ of codewords such that $x_{2}\cdots x_{p}=wy_{2}\cdots y_{q}$ , then we are finished: For then the string $x=x_{1}x_{2}\cdots x_{p}$ can alternatively be decomposed as $y_{1}y_{2}\cdots y_{q}$ , and we have found the desired string having at least two different decompositions into codewords. In the second round, we try out two different approaches: the first, perhaps more obvious trial is to look for a codeword that has w as prefix. Then we obtain a new dangling suffix w', with which we can continue our search. If we eventually encounter a dangling suffix that is itself a codeword (or even better: the empty word), then the search will terminate, as we know there exists a string with two decompositions. The second, and less maybe less obvious trial is to seek for a codeword that is itself a prefix of w. In our example, we have $w=10$ , and the sequence 1 is a codeword. We can thus also continue with w'=0 as the new dangling suffix.

Precise description of the algorithm

The algorithm is described most conveniently using quotients of formal languages. In general, for two sets of strings D and N, the (left) quotient $N^{-1}D$ is defined as the residual words obtained from D by removing some prefix in N. Formally, $N^{-1}D=\{\,y\mid xy\in D~{\textrm {and}}~x\in N\,\}$ . Now let $C$ denote the (finite) set of codewords in the given code.

The algorithm proceeds in rounds, where we maintain in each round not only one dangling suffix as described above, but the (finite) set of all potential dangling suffixes. Starting with round $i=1$ , the set of potential dangling suffixes will be denoted by $S_{i}$ . The sets $S_{i}$ are defined inductively as follows:

$S_{1}=C^{-1}C\setminus \{\varepsilon \}$ . Here, the symbol $\varepsilon$ denotes the empty word.

$S_{i+1}=C^{-1}S_{i}\cup S_{i}^{-1}C$ , for all $i\geq 1$ .

The algorithm computes the sets $S_{i}$ in increasing order of $i$ . As soon as one of the $S_{i}$ contains a word from C or the empty word, then the algorithm terminates and answers that the given code is not uniquely decodable. Otherwise, once a set $S_{i}$ equals a previously encountered set $S_{j}$ with $j<i$ , then the algorithm would enter in principle an endless loop. Instead of continuing endlessly, it answers that the given code is uniquely decodable.

Termination and correctness of the algorithm

Since all sets $S_{i}$ are sets of suffixes of a finite set of codewords, there are only finitely many different candidates for $S_{i}$ . Since visiting one of the sets for the second time will cause the algorithm to stop, the algorithm cannot continue endlessly and thus must always terminate. A proof that the algorithm is correct, i.e. that it always gives the correct answer, is found in the textbooks by Salomaa^[3] and by Berstel et al.^[4]

Notes

^ Knuth (2003), p. 2
^ Berstel et al. (2009), Example 2.3.1 p. 63
^ Salomaa (1981)
^ Berstel et al. (2009), Chapter 2.3

References

Arto Salomaa: Jewels of Formal Language Theory. Pitman Publishing Ltd., 1981.
Donald E. Knuth: Robert W Floyd, In Memoriam. SIGACT News 34(4):3–13, December 2003.
Jean Berstel, Dominique Perrin and Christophe Reutenauer: Codes and Automata. Cambridge University Press, to appear (estimated Nov. 2009). Draft available online

Further reading

August Albert Sardinas and George W. Patterson: A necessary and sufﬁcient condition for the unique decomposition of coded messages. Convention Record of the I.R.E., 1953 National Convention, Part 8: Information Theory, pp. 104–108, 1953.
Robert G. Gallager: Information Theory and Reliable Communication. Wiley, 1968

[1] Knuth (2003), p. 2

[2] Berstel et al. (2009), Example 2.3.1 p. 63

[3] Salomaa (1981)

[4] Berstel et al. (2009), Chapter 2.3

[1]

[2]

[3]

[4]

Idea of the algorithm

Precise description of the algorithm

Termination and correctness of the algorithm

See also

Notes

References