BCJR algorithm

The BCJR algorithm is an algorithm for maximum a posteriori decoding of error correcting codes defined on trellises (principally convolutional codes). The algorithm is named after its inventers: Bahl, Cocke, Jelinek and Raviv ^[1]. This algorithm is critical to modern iteratively-decoded error-correcting codes including turbo codes and low-density parity-check codes.

References

^ L.Bahl, J.Cocke, F.Jelinek, and J.Raviv, "Optimal Decoding of Linear Codes for minimizing symbol error rate", IEEE Transactions on Information Theory, vol. IT-20(2), pp.284-287, March 1974.

The detector that minimizes the probability of error for each symbol decision is the maximum a posteriori probability (MAP) detector. For example, to make a decision about the zero-th symbol a0, this detector calculates the a posteriori probability Pr [a0|r] for each possible a0 A, and decides on the a0 that maximizes Pr [a0|r]. This process is then repeated for all L transmitted symbols. In the context of the trellis diagram, these probabilities are easily computed once the a posteriori state transition probabilities Pr [Ψk = p; Ψk + 1 = q|r] are known for each state transition (or branch) in the trellis. The BCJR algorithm provides a computationally efficient method for finding these state transition probabilities.

The key to the BCJR algorithm is to decompose the a posteriori transition probability for a transition at time k into three separable factors: the first depending only on the “past” observations rl < k = {rl : l < k}, the second depending on only the “present” observation rk, and the third depending only on the “future” observations rl > k = {rl : l > k}. We can accomplish this through the following series of straightforward equalities:

Pr [Ψk = p; Ψk + 1 = q|r] = p (Ψk = p; Ψk + 1 = q; r) ⁄ p(r)

                         = p (Ψk = p; Ψk + 1 = q; rl < k; rk; rl > k) ⁄ p(r)
                         = p (rl > k | Ψk = p; Ψk + 1 = q; rl < k; rk) * p (Ψk = p; Ψk + 1 = q; rl < k; rk) ⁄ p(r).		(5)

Because of the Markov property of the finite-state machine model for the channel, knowledge of the state at time k + 1 supersedes knowledge of the state at time k, and it also supersedes knowledge of rk and rl < k, so that (5) reduces to:

Pr [Ψk = p; Ψk + 1= q|r] = p (rl > k|Ψk + 1 = q) * p (Ψk = p; Ψk + 1 = q; rl < k; rk) ⁄ p(r)
 = p (rl > k|Ψk + 1 = q) * p (Ψk + 1 = q;  rk|Ψk = p; rl < k) * p (Ψk = p; rl < k) ⁄ p(r)				(6)

Again, exploiting the Markov property, this simplifies (with a reordering of terms) to:

   Pr [Ψk = p; Ψk + 1 = q|r] =  p (Ψk = p; rl < k) [(α k (p))]*p (Ψk + 1 = q; rk|Ψk = p) [(γ k (p, q))]*p (rl > k|Ψk + 1 = q) ⁄ p(r) [(βk+1(q))] ⁄ p(r)		(7)

Observe that αk (p) is a probability measure for state p at time k that depends only on the past observations rl < k. On the other hand, βk + 1(q) is a probability measure for state q at time k + 1 that depends only on the future observations rl > k. And finally, k(p, q) is a probability measure connecting state p at time k to state q at time k + 1, and it depends only on the present (kth) observation rk.

                                 The ultimate goal is calculate the a posteriori probabilities for the symbols ak, and not for the transitions. Fortunately, it is easy to translate from one to the other. Specifically, let Sa denote the set of integer pairs (p, q) for which a state transition from p to q corresponds to an input symbol of a. Then the a posteriori pdf(probability density function)  for the kth symbol ak is related to the a posteriori transition probabilities of (7) by:

Pr [ak = a|r] = = (1/p(r)) (8)

CHAPTER 3

DERIVATIONS AND CALCULATIONS

3.1. CALCULATION OF BRANCH METRIC

                                    The BCJR algorithm described below can be viewed as a generalization of the Viterbi algorithm for ML sequence detection. Rather than making one pass through the trellis from start to finish, as is done in the Viterbi algorithm, the BCJR algorithm makes two passes: one forward pass from start to finish, and then a second backward pass from finish to start. Furthermore, whereas the Viterbi algorithm uses the branch metric k (p, q) of (3), the BCJR algorithm uses the branch metric k (p, q), where from (7):

We now calculate this branch metric for the special case of the AWGN channel

                                                        rk = sk + nk,

Where the real and imaginary components of nk are independent and identically distributed zero-mean Gaussian random variables with variance 2. Let a (p, q) and s (p, q) denote the unique input symbol and expected channel output, respectively, associated with a transition from state p to state q. Then the first term in (10) reduces to:

P (rk|k = p; k + 1 = q) = [exp (-1/22) |rk – s (p, q) |2] /22. (11)

On the other hand, the second term in (9) reduces to:

Pr [k + 1 = q|k = p] = Pr [ak = a (p, q);k = p] ⁄ Pr[k = p]

= Pr [k = p|ak = a (p, q)] Pr [ak = a (p, q)] ⁄ Pr [k = p]
= Pr [k = p] Pr [ak = a (p, q)] ⁄ Pr [k = p]
= Pr [a k = a (p, q)]. 				           (12)

Observing that, this is the a priori probability for the k-th symbol. Plugging (11) and (12) into (10) yields:

k (p, q) = [exp {(-|rk – s (p, q) |2) / 22} Pr [ak = a (p, q)]] / 22 (13) Comparing this to (3), we see that the only difference between the BCJR branch metric k (p, q) and the Viterbi metric k (p, q) is the extra factor of Pr [ak = a (p, q)] in (13). There are many applications in which all symbols are equally likely, in which case the a priori probability Pr[ak = a] is a constant, independent of a. In such applications, the BCJR metric is equivalent to the Viterbi metric. However, there are important applications in which all symbols are not equally likely, and exploiting this knowledge can be beneficial.

3.2. VECTOR NOTATION We are introducing the vector notation for later use. Define the 1 *Q row vector k by the collection of k (p) values at time k, one for each of the Q states, p = 0….. Q – 1: k = [k (0), k (1)… k (Q – 1)]. (14) Similarly, define the Q 1 column vector k by the collection of k (p) values at time k, one for each of the Q states, p = 0Q – 1: k = [k (0), k (1)… k (Q – 1)] T. (15) Finally, defining a Q *Q matrix k for the kth stage of the trellis according to

[k]i, j = k (i, j).

3.3. RECURSIVE TECHNIQUES FOR AND CALCULATION

                                 Deriving a recursive technique for calculating k. By definition,

k + 1(q) = p (k + 1 = q; rl < k + 1) = p (k + 1 = q; rk; rl < k) = p (k + 1 = q; rk; k = p; rl < k) = p (k + 1 = q; rk |k = p; rl < k) p (k = p; rl < k) = p (k + 1 = q; rk|k = p) p (k = p; rl < k) = k (p, q) k (p). (16) Observing the similarity between this recursion and the Viterbi recursion of (4). We conclude that the BCJR recursion can be derived from Viterbi recursion by simply replacing the maximum operator maxp { } by the summation operator p{ }. This difference is not always significant, however, because the exponential form of the branch metric (13) implies that the maximum is often a good approximation to the sum. In terms of the vector notation, the recursion (16) reduces to:

k + 1 =k k. (17) Deriving a similar recursion for k. By definition, we have:

k (p) = p (rl > k – 1|k = p)

= p (rl > k; rk |k = p)
               = p (rl > k; rk; k + 1 = q|k = p)
                                                  = p (rl > k|rk; k + 1 = q;k = p)p(rk; k + 1 = q| k = p)
                                   = p (rl > k|k + 1 = q) p (rk; k + 1 = q|k = p)
                                                   = k + 1(q)k(p, q).				           (18)

Equivalently, in vector notation: k =k k + 1. (19) The BCJR algorithm can be summarized in vector form as follows. 1. Initialize 0 = [1, 0, 0…….. 0]. 2. Iterate the forward recursion k + 1 =kk for stage k = 0 through stage k = L +– 1. 3. Initialize L + = [1, 0, 0…… 0] T. 4. Iterate the backward recursion

			k =kk + 1 for stage k = L + – 1 through stage k = 0.

5. The a posteriori pdf for ak is then:

		Pr [ak = a|r] = [ k (p) k (p, q) k + 1(q)] / p(r).	           (20)

This quantity is calculated for each a A, and for each input symbol a0 through a L – 1.

3.4. CALCULATING P(r)

                                    In practice, the quantity p(r) in the denominator of (20) can be ignored, because it is common to all posteriori probabilities, and hence will not affect the maximization procedure. Nevertheless, we briefly point out a simple way to calculate p(r). Specifically, the inner product kk is equal to p(r) for any time k. To show this, consider the a posteriori joint probability density function of (5):

Pr [k = p; k + 1 = q |r] = k(p)k(p, q)k + 1(q) ⁄ p(r). (21)

This conditional pdf must sum to one, and hence: 1 = k(p)k(p, q)k + 1(q) ⁄ p(r), (22) or equivalently: P(r) = k (p) k (p, q) k + 1(q)

 				=k kk + 1.

=k k. (23) Observe that (23) is valid for any time k, including k = 0 and k = L +, in which case (23) reduces to: P(r) = 0 (0) = L + (0). (24) This can be used in (20). This is the minimum-probability-of-symbol-error detector. Unlike the Viterbi algorithm, which produces the “most likely sequence”, the BCJR algorithm produces the “sequence of most likely symbols”.

External links

The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay, discusses the BCJR algorithm in chapter 25.

This computer science article is a stub. You can help Wikipedia by expanding it.

This mathematics-related article is a stub. You can help Wikipedia by expanding it.

[bcjr-1] L.Bahl, J.Cocke, F.Jelinek, and J.Raviv, "Optimal Decoding of Linear Codes for minimizing symbol error rate", IEEE Transactions on Information Theory, vol. IT-20(2), pp.284-287, March 1974.

[1]