In machine learning and statistics, EM (Expectation Maximization) Algorithm handles latent variables. GMM means Gaussian mixture model. In this article, we'll have an introduction on how to use EM Algorithm to handle GMM.
Background
First let's warm up with a simple scenario. In the picture below,
we have the red blood cell hemoglobin concentration and the red blood cell volume
data of two groups of people, the Anemia group and the Control Group(i.e. the group of people without Anemia).
It's clear that people with Anemia have lower red blood cell volume and lower red blood cell hemoglobin concentration
than those without Anemia.
GMM model with labels
To make it simple, let be a random vector:
And denote as the group where belongs.( when belongs to Anemia Group and when belongs to Control Group).
And from medical knowledge, we believe that are normally distributed in each group, i.e. .
Also , where (in this scenario, ).
Now we'd like to estimate .
We can use maximum likelihood estimation on this question. The log likelihood function is
shown below.
As we know the for each , the log likelihood function can be
simplified as below:
Now we can maximize the likelihood function by making partial derivative over .
Since this step only involves some simple algebra calculation, I'll directly show the result.
In the example above, we can see that if is known to us, the estimation of
parameters can be quite simple with maximum likelihood estimation. But what if
is unknown to us? It'll be hard to estimate the parameters.
[2]
GMM without labels
In this case, we call a latent variable(i.e. not observed). With unlablled scenario,
we need the Expectation Maximization Algorithm to estimate $z$ as well as other parameters.
Generally, we would name the problem setting above as GMM
since the data in each group is normally distributed.
[3][circular reference]
In a general circumstance in machine learning, we can
see the latent variable as some latent pattern lying under the data, which we cannot
see very directly. And we can see as our data, as the parameter of the model.
With EM algorithm, we may find some underlying pattern in the data , along with the estimation
of parameters. The wide application of this circumstance in machine learning makes EM algorithm very important.
[4]
EM Algorithm in GMM
The EM Algorithm consists of two steps: the E-step and the M-step.
Firstly, we can randomly initialize the value of our model parameters and the
In the E-step, the algorithm tries to guess the value of based on the parameters.
In the M-step, the algorithm updates the value of the model parameters based on the guess of
in the E-step. These two steps will repeat until convergence. Let's see the algorithm in GMM first.