Maximum Likelihood And
Expectation Maximization
Lecture Notes for CMPUT 466/551
Nilanjan Ray
MLE and EM
•
Maximum Likelihood Estimation (MLE) and Expectation
Maximization are two very important tools in Machine
Learning
•
Essentially you use them in
estimating probability distributions
in a
learning algorithm; we have already seen one such example
–
in
logistic regression we used
MLE
•
We will revisit MLE here, realize certain difficulties of
MLE
•
Then Expectation Maximization (EM) will rescue us
Probability Density Estimation: Quick Points
Two different routes:
Parametric
•
Provide a parametrized class of
density functions
•
Tools:
–
Maximum likelihood estimation
–
Expectation Maximization
–
Sampling techniques
–
….
Non

Parametric
•
Density is modeled by samples:
•
Tools:
–
Kernel Methods
–
Sampling techniques
–
…
Revisiting Maximum Likelihood
The data is coming from a
known
probability distribution
The probability distribution has some parameters that are
unknown
to you
Example: data is distributed as Gaussian
y
i
~
N
(
,
2
),
so the unknown parameters here are
= (
,
2
)
MLE is a
tool
that estimates the unknown parameters of the probability
distribution from data
MLE: Recapitulation
•
Assume
observation data
y
i
are
independent
•
Form the Likelihood
:
•
Form the
Log

likelihood
:
•
To
find out the
unknown parameter values
, maximize the log

likelihood with respect to the unknown
parameters:
MLE: A Challenging Example
Observation data:
histogram
Indicator variable
is the probability with which the observation is chosen from density 2
(1

) is the probability with which the observation is chosen from density 1
Mixture
model:
Source: Department
of Statistics, CMU
MLE: A Challenging Example
…
Maximum likelihood
fitting for parameters:
Numerically (and of course analytically, too)
Challenging
to
solve!!
Expectation Maximization: A Rescuer
EM
augments
the data space
–
assumes some
latent
data
Source: Department of Statistics, CMU
EM: A Rescuer
…
Maximizing this form of log

likelihood is now
tractable
Note that we
cannot
analytically maximize
this log

likelihood
Source: Department of Statistics, CMU
EM: The Complete Data Likelihood
By simple differentiations we have:
How do we get the latent variables?
So, maximization of the complete data likelihood is much easier!
Obtaining Latent Variables
The latent variables are computed as expected values
given the
data
and
parameters
:
Apply
Bayes
’ rule:
EM for Two

component Gaussian Mixture
•
Initialize
1
,
1
,
2
,
2
,
•
Iterate until convergence
–
Expectation
of latent variables
–
Maximization
for finding parameters
EM for Mixture of
K
Gaussians
•
Initialize mean vectors, covariance matrices, and mixing
probabilities:
k
,
k
,
k
,
k
=1,2,…,
K
.
•
Expectation
Step: compute responsibilities
•
Maximization
Step: update parameters
•
Iterate Steps Expectation and Maximization until convergence
EM Algorithm in General
T = (Z,
Z
m
) is the complete data; we only know Z,
Z
m
is missing
Taking logarithm:
Because we have access to previous parameter values
, we can do better:
Let us now consider the expression:
It can be shown that
Thus if
’
maximizes
then
This is actually done by Jensen’s inequality
•
Start with initial parameter values
(0)
;
t
= 1
•
Expectation step: compute
•
Maximization step:
•
t
=
t
+ 1 and iterate
EM Algorithm in General
EM Algorithm: Summary
•
Augment the original data space by
latent/hidden/missing data
•
Frame a suitable probability model for the augmented
data space
•
In EM iterations, first assume initial values for the
parameters
•
Iterate the
Expectation
and the
Maximization
steps
•
In the
Expectation
step, find the expected values of the
latent variables (here you need to use the
current
parameter values
)
•
In the
Maximization
step, first plug in the expected
values of the latent variables in the log

likelihood of the
augmented data. Then maximize this log

likelihood to
reevaluate the parameters
•
Iterate last two steps until convergence
Applications of EM
–
Mixture models
–
HMMs
–
PCA
–
Latent variable models
–
Missing data problems
–
many computer vision problems
–
…
References
•
The EM Algorithm and Extensions
by
Geoffrey J. MacLauchlan, Thriyambakam
Krishnan
•
For a non

parametric density estimate by EM
look at:
http://bioinformatics.uchc.edu/LectureNotes_200
6/Tools_EM_SA_2006_files/frame.htm
EM: Important Issues
•
Is the convergence of the algorithm guaranteed
?
•
Does the outcome of EM depend on the initial
choice of the parameter values
?
•
How about the speed of convergence
?
•
How easy or difficult could it be to compute the
expected values of the latent variables?
Comments 0
Log in to post a comment