A family of algorithms for approximate Bayesian inference

Thomas Minka
MIT PhD thesis, 2001

One of the major obstacles to using Bayesian methods for pattern recognition has been its computational expense. This thesis presents an approximation technique that can perform Bayesian inference faster and more accurately than previously possible. This method, ``Expectation Propagation,'' unifies and generalizes two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. The unification shows how both of these algorithms can be viewed as approximating the true posterior distribution with a simpler distribution, which is close in the sense of KL-divergence. Expectation Propagation exploits the best of both algorithms: the generality of assumed-density filtering and the accuracy of loopy belief propagation.

Loopy belief propagation, because it propagates exact belief states, is useful for limited types of belief networks, such as purely discrete networks. Expectation Propagation approximates the belief states with expectations, such as means and variances, giving it much wider scope. Expectation Propagation also extends belief propagation in the opposite direction---propagating richer belief states which incorporate correlations between variables.

This framework is demonstrated in a variety of statistical models using synthetic and real-world data. On Gaussian mixture problems, Expectation Propagation is found, for the same amount of computation, to be convincingly better than rival approximation techniques: Monte Carlo, Laplace's method, and variational Bayes. For pattern recognition, Expectation Propagation provides an algorithm for training Bayes Point Machine classifiers that is faster and more accurate than any previously known. The resulting classifiers outperform Support Vector Machines on several standard datasets, in addition to having a comparable training time. Expectation Propagation can also be used to choose an appropriate feature set for classification, via Bayesian model selection.

Postscript (440K) . PDF (990K)

The slides from my defense are also available: PDF (574K)

I'm providing a matlab toolbox for classification with the Bayes Point Machine.


Expectation Propagation for approximate Bayesian inference

Thomas Minka
UAI'2001, pp. 362-369

This is a short version of the above thesis. It includes the free-energy formulation of EP.

This PDF contains a correction to the published version, in the updates for for the Bayes Point Machine. This doesn't affect the experimental results because epsilon was 0, however it does make a difference for epsilon > 0. Thanks to Yuan Qi for finding this.

PDF (213K) (Doubly-corrected version, 1/30/04)

The slides from my UAI talk are also available, in two parts: part 1 . part 2


The EP energy function and minimization schemes

This is a short note which discusses the EP energy function in more detail than I could include in the UAI paper:
PDF (62K)

A roadmap to research on EP


Last modified: Wed Aug 17 18:09:51 GMT 2005