This is the user sandbox of Gezzer898. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

Cryptanalysis

The Baum-Welch algorithm is often used to estimate the parameters of HMMs in deciphering hidden or noisy information and consequently is often used in Cryptanalysis. In data security an observer would like to extract information from a data stream without knowing all the parameters of the transmission. This can involve reverse engineering a channel encoder.^[1] HMMs and as a consequence the Baum-Welch algorithm have also been used to identify spoken phrases in encrypted VoIP calls.^[2] In addition HMM cryptanalysis is an important tool for automated investigations of cache-timing data. It allows for the automatic discovery of critical algorithm state, for example key values.^[3]

Description

A Hidden Markov Model describes the joint probability of a collection of 'hidden' and observed discrete random variables. It relies on the assumption that the $i^{th}$ hidden variable given the $(i-1)^{th}$ hidden variable is independent of previous hidden variables and the current observation variables depend only on the current hidden state.

The Baum-Welch algorithm uses the well known EM algorithm to find the maximum likelihood estimate of the parameters of a hidden Markov model given a set of observed feature vectors.

Let $X_{t}$ be a discrete hidden random variable with $N$ possible values. We assume the $P(X_{t}|X_{t-1})$ is independent of time $t$ . We can present this information as a time independent stochastic transition matrix $A=\{a_{ij}\}=P(X_{t}=j|X_{t-1}=i)$

The initial state distribution (i.e. when

t=1

) is given by

\pi _{i}=P(X_{1}=i)

The observation variables

Y_{t}

can take one of

K

possible values. The probability of a certain observation vector at time

t

for state

j

is given by:

b_{j}(y_{t})=P(Y_{t}=y_{t}|X_{t}=j)

B=\{b_{ij}\}

is a

K

by

N

matrix.

An observation sequence is given by $Y=(Y_{1}=y_{1},Y_{2}=y_{2},...,Y_{t}=y_{t})$

Thus we can describe a hidden Markov chain by $\theta =(A,B,\pi )$ . The Baum-Welch algorithm finds $\theta ^{*}=\max _{\theta }P(Y|\theta )$ . (i.e. the HMM parameters $\theta$ that maximise the probability of the observation.)

Algorithm

Set $\theta =(A,B,\pi )$ with random initial conditions. They can also be set using prior information about the parameters if it is available.

Forward Procedure

Let $\alpha _{i}(t)=P(Y_{1}=y_{1},...,Y_{t}=y_{t},X_{t}=i|\theta )$ , the probability of seeing the $y_{1},y_{2},...,y_{t}$ and being in state $i$ at time $t$ . This is found recursively:

$\alpha _{i}(t)=\pi _{i}b_{i}(y_{1})$
$\alpha _{j}(t+1)=b_{j}(y_{t+1})\sum _{i=1}^{N}\alpha _{i}(t)a_{ij}$

Backward Procedure

Let $\beta _{i}(t)=P(Y_{t+1}=y_{t+1},...,Y_{T}=t_{T}|X_{t}=i,\theta )$ that is the probability of the ending partial sequence $y_{t+1},...,y_{T}$ given starting state $i$ and time $t$ . We calculate $\beta _{i}(t)$ as,

$\beta _{i}(T)=1$
$\beta _{i}(t)=\sum _{j=1}^{N}\beta _{j}(t+1)a_{ij}b_{j}(y_{t+1})$

Update

We can now calculate the temporary variables:

\gamma _{i}(t)=P(X_{t}=i|Y,\theta )={\frac {\alpha _{i}(t)\beta _{i}(t)}{\sum _{j=1}^{N}\alpha _{j}(t)\beta _{j}(t)}}

which is the probability of being in state $i$ at time $t$ given the observed sequence $Y$ and the parameters $\theta$

\xi _{ij}(t)=P(X_{t}=i,X_{t+1}=j|Y,\theta )={\frac {\alpha _{i}(t)a_{ij}\beta _{j}(t+1)b_{j}(y_{t+1})}{\sum _{i=1}^{N}\sum _{j=1}^{N}\alpha _{i}(t)a_{ij}\beta _{j}(t+1)b_{j}(y_{t+1})}}

which is the probability of being in state $i$ and $j$ at times $t$ and $t+1$ respectively given the observed sequence $Y$ and parameters $\theta$ .

\theta

can now be updated:

$\pi _{i}^{*}=\gamma _{i}(1)$

which is the expected frequency spent in state $i$ at time $1$ .

$a_{ij}^{*}={\frac {\sum _{t=1}^{T-1}\xi _{ij}(t)}{\sum _{t=1}^{T-1}\gamma _{i}(t)}}$

which is the expected number of transitions from state i to state j compared to the expected total number of transitions away from state i.

$b_{i}^{*}(k)={\frac {\sum _{t=1}^{T}1_{y_{t}=v_{k}}\gamma _{i}(t)}{\sum _{t=1}^{T}\gamma _{i}(t)}}$

where $1_{X_{t}=x_{k}}={\begin{cases}1,&{\text{if}}y_{t}=v_{k}\\0,&{\text{otherwise}}\\\end{cases}}$ is an indicator function and $b_{i}^{*}(k)$ is the expected number of times the output observations have been equal to $v_{k}$ while in state $i$ over the expected total number of times in state $i$ .
These steps are now repeated iteratively until a desired level of convergence.
Note: It is possible to over-fit a particular data set. That is $P(Y|\theta _{final})>P(Y|\theta _{true})$ . The algorithm also does not guarantee a global maximum

Example

Suppose we have a chicken from which we collect eggs at noon everyday. Now whether or not the chicken has laid eggs for collection depends on some unknown factors that are hidden. We can however (for simplicity) assume that there are only two states that determine whether the chicken lays eggs. Now we don't know the state at the initial starting point, we don't know the transition probabilities between the two states and we don't know the probability that the chicken lays an egg given a particular state. To start we first guess the transition and emission matrices.

Transition
	State 1	State 2
State 1	0.5	0.5
State 2	0.3	0.7

Emission
	No Eggs	Eggs
State 1	0.3	0.7
State 2	0.8	0.2

Initial
State 1	0.2
State 2	0.8

We then take set of observations (E = eggs, N = no eggs): NN, NN, NN, NN, NE, EE, EN, NN, NN
The next step is to estimate a new transition matrix.

Observed sequence	Probability of sequence and state is S1 then S2	Highest Probability of observing that sequence
NN	0.024	0.3584 S2,S2
NN	0.024	0.3584 S2,S2
NN	0.024	0.3584 S2,S2
NN	0.024	0.3584 S2,S2
NE	0.006	0.1344 S2,S1
EE	0.014	0.049 S1,S1
EN	0.056	0.056 S1,S2
NN	0.024	0.3584 S2,S2
NN	0.024	0.3584 S2,S2
Total	0.22	2.3898

Thus the new estimate for the S1 to S2 transition is now ${\frac {0.22}{2.3898}}=0.0921$ . We then calculate the S2 to S1, S2 to S2 and S1 to S1 transition probabilities and normalize so they add to 1. This gives us the updated transition matrix.
Next, we want to estimate a new emission matrix,

Observed Sequence	Highest probability of observing that sequence if E is assumed to come from S1	Highest Probability of observing that sequence
NE	0.006 S2,S1	0.1344 S2,S1
EE	0.014 S1,S1	0.049 S1,S1
EN	0.056 S1,S2	0.056 S1,S2

This allows us to calculate the emission matrix as described above in the algorithm, by adding up the probabilities for the respective observed sequences. We then repeat for if N came from S1 and for if N and E came from S2 and normalize.
To estimate the initial probabilities we assume all sequences start with the hidden state S1 and calculate the highest probability and then repeat for S2. Again we then normalize to give an updated initial vector.
Finally we repeat these steps until the resulting probabilities converge satisfactorily.

^ Dingel, Janis (24). "Parameter Estimation of a Convolutional Encoder from Noisy Observations". IEEE International Symposium on Information Theory. {{cite journal}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)
^ Wright, Charles; Ballard, Lucas; Coull, Scott; Monrose, Fabian; Masson, Gerald (2008). "Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations". IEEE International Symposium on Security and Privacy.
^ Brumley, Billy Bob; Hakala, Risto M. (2009). "Cache-Timing Template Attacks". Advances in Cryptography. Lecture Notes in Computer Science. 5912: 667-684. doi:10.1007/978-3-642-10366-7_39. ISBN 978-3-642-10365-0. Retrieved 21 October 2013.{{cite journal}}: CS1 maint: date and year (link)

[1] Dingel, Janis (24). "Parameter Estimation of a Convolutional Encoder from Noisy Observations". IEEE International Symposium on Information Theory. {{cite journal}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)

[2] Wright, Charles; Ballard, Lucas; Coull, Scott; Monrose, Fabian; Masson, Gerald (2008). "Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations". IEEE International Symposium on Security and Privacy.

[3] Brumley, Billy Bob; Hakala, Risto M. (2009). "Cache-Timing Template Attacks". Advances in Cryptography. Lecture Notes in Computer Science. 5912: 667-684. doi:10.1007/978-3-642-10366-7_39. ISBN 978-3-642-10365-0. Retrieved 21 October 2013.{{cite journal}}: CS1 maint: date and year (link)

[1]

[2]

[3]