# The Expectation Maximization Algorithm

**Topics:**Expectation-maximization algorithm, Estimation theory, Arthur P. Dempster

**Pages:**3 (436 words)

**Published:**October 15, 2008

Frank Dellaert

College of Computing, Georgia Institute of Technology

Technical Report number GIT-GVU-02-20

February 2002

Abstract

This note represents my attempt at explaining the EM algorithm (Hartley, 1958; Dempster et al., 1977; McLachlan and Krishnan, 1997). This is just a slight variation on Tom Minka’s tutorial (Minka, 1998), perhaps a little easier (or perhaps not). It includes a graphical example to provide some intuition. 1 Intuitive Explanation of EM

EM is an iterative optimization method to estimate some unknown parameters, given measurement data U. However, we are not given some “hidden” nuisance variables J, which need to be integrated out. In particular, we want to maximize the posterior probability of the parameters given the data U, marginalizing over J: = argmax

X

J2Jn

P(; JjU) (1)

The intuition behind EM is an old one: alternate between estimating the unknowns and the hidden variables J. This idea has been around for a long time. However, instead of finding the best J 2 J given an estimateat each iteration, EM computes a distribution over the space J . One of the earliest papers on EM is (Hartley, 1958), but the seminal reference that formalized EM and provided a proof of convergence is the “DLR” paper by Dempster, Laird, and Rubin (Dempster et al., 1977). A recent book devoted entirely to EM and applications is (McLachlan and Krishnan, 1997), whereas (Tanner, 1996) is another popular and very useful reference. One of the most insightful explanations of EM, that provides a deeper understanding of its operation than the intuition of alternating between variables, is in terms of lowerbound maximization (Neal and Hinton, 1998; Minka, 1998). In this derivation, the E-step can be interpreted as constructing a local lower-bound to the posterior distribution, whereas the M-step optimizes the bound, thereby improving the estimate for the unknowns. This is demonstrated below for a simple...

Please join StudyMode to read the full document