The Expectation Maximization Algorithm
College of Computing, Georgia Institute of Technology
Technical Report number GIT-GVU-02-20
This note represents my attempt at explaining the EM algorithm (Hartley, 1958; Dempster et al., 1977; McLachlan and Krishnan, 1997). This is just a slight variation on Tom Minka’s tutorial (Minka, 1998), perhaps a little easier (or perhaps not). It includes a graphical example to provide some intuition. 1 Intuitive Explanation of EM
EM is an iterative optimization method to estimate some unknown parameters, given measurement data U. However, we are not given some “hidden” nuisance variables J, which need to be integrated out. In particular, we want to maximize the posterior probability of the parameters given the data U, marginalizing over J: = argmax
P(; JjU) (1)
The intuition behind EM is an old one: alternate between estimating the unknowns and the hidden variables J. This idea has been around for a long time. However, instead of finding the best J 2 J given an estimateat each iteration, EM computes a distribution over the space J . One of the earliest papers on EM is (Hartley, 1958), but the seminal reference that formalized EM and provided a proof of convergence is the “DLR” paper by Dempster, Laird, and Rubin (Dempster et al., 1977). A recent book devoted entirely to EM and applications is (McLachlan and Krishnan, 1997), whereas (Tanner, 1996) is another popular and very useful reference. One of the most insightful explanations of EM, that provides a deeper understanding of its operation than the intuition of alternating between variables, is in terms of lowerbound maximization (Neal and Hinton, 1998; Minka, 1998). In this derivation, the E-step can be interpreted as constructing a local lower-bound to the posterior distribution, whereas the M-step optimizes the bound, thereby improving the estimate for the unknowns. This is demonstrated below for a simple...
Please join StudyMode to read the full document