# Bayesian inference in dynamic models -- an overview

### by Tom Minka

The following algorithms all try to infer the hidden state of a dynamic model from measurements. The input is a dynamic model and a measurement sequence and the output is an approximate posterior distribution over the hidden state at one or many times. Only discrete-time models are discussed here.

Inferring only the most recent hidden state is known as filtering; inferring past states is known as smoothing. Most filtering methods are on-line, which means they process each measurement exactly once, after which it can be discarded. This allows them to operate with a fixed amount of memory. The opposite of on-line is off-line or batch. There are standard ways to turn an on-line filtering algorithm into a batch filtering or smoothing algorithm. Therefore, the overview is divided into two parts: on-line filtering and batch filtering/smoothing.

Some of these algorithms are general algorithms for approximate Bayesian inference and others are specialized for dynamic models. With the description of each algorithm is a partial list of references. I've included more references for algorithms which are less well-known.

Some related pages on the web:

## On-line filtering algorithms

The algorithms are grouped according to how they represent the posterior distribution over the hidden state (their belief).

## Gaussian belief

The following algorithms use a multivariate Gaussian for their belief. In fact, most of them are more general than this---they could use any exponential family as their belief.

Kalman filter
The Kalman filter only applies to models with Gaussian noise, linear state equations, and linear measurement equations, i.e.
```s_t = A s_(t-1) + noise
x_t = C s_t + noise
```
For these models the state posterior really is Gaussian, and the Kalman filter is exact.

Extended Kalman filter
The Extended Kalman filter applies to models with Gaussian noise. The state and measurement equations are linearized by a Taylor expansion about the current state estimate. The noise variance in the equations is not changed, i.e. the additional error due to linearization is not modeled. After linearization, the Kalman filter is applied.
• "Stochastic models, estimation and control", Peter S. Maybeck, Volume 2, Chapter 12, 1982.
• "A linear approximation method for probabilistic inference", Ross Shachter, UAI'1990.

Bottleneck filter
This algorithm applies to any type of measurement equation. The measurement equation is rewritten in terms of an intermediate bottleneck variable b_t, such that p(x_t|b_t) is simple while p(b_t|s_t) may be complicated. At each time step, the Gaussian belief on s_t is converted into a Gaussian belief on b_t (usually involving approximations), b_t is updated exactly from x_t (since p(x_t|b_t) is simple), and the new Gaussian belief on b_t is converted back to a Gaussian belief on s_t. ("Bottleneck" is my own terminology. In the West paper below, they used Gamma distributions.)
• "Dynamic Generalized Linear Models and Bayesian Forecasting," M. West, P. J. Harrison, & H. S. Migon, J Am Stat Assoc 80:73-97, 1985.

Linear-update filter a.k.a. linear-regression filter or "statistical linearization" filter
This algorithm applies to any type of measurement equation. The measurement equation is converted into a linear-Gaussian equation by regressing the observation onto the state. The result is a Kalman filter whose Kalman gain is cov(state,measurement)/var(measurement). Note that the regression is done without reference to the actual measurement. I call it "linear-update" because the update to the state is always a linear function of the measurement. A variety of approximations have been proposed when cov(state,measurement) is not available analytically. The unscented filter, central difference filter, and divided difference filter are filters of this type.

Assumed-density filter a.k.a. moment matching
This algorithm chooses the Gaussian belief which is "closest", in some sense, to the exact state posterior given previous beliefs. Usually this amounts to matching the moments of the exact posterior. This is the most general approximation technique proposed to date---it utilizes not only the form of the measurement equation but also the measurement itself. The assumed-density filter requires analytic or numerical solution of integrals over the measurement distribution. For this, one could use Monte Carlo, quadrature, or Laplace's method.

## Mixture of Gaussian belief

A natural choice in moving beyond a single Gaussian is to use a mixture of Gaussian belief. Unfortunately, an algorithm for general dynamic models has proven elusive. Instead, existing algorithms assume that the dynamic model is a mixture of linear-Gaussian models, i.e. it switches randomly between different linear-Gaussian state/measurement equations. This can be understood as having discrete state variables in addition to the continuous ones. For these models, the true state posterior is a mixture of Gaussians, but a very big one. The algorithms focus on reducing the size of this mixture, in an on-line way. Most of them generalize beyond Gaussian to any exponential family.

Assumed-density filter a.k.a. "pseudo-Bayes"
Same as assumed-density filtering for a single Gaussian, but now the belief representation is categorical for the discrete state component and conditional Gaussian for the continuous state component, producing a mixture of Gaussian marginal for the continuous state component. For each setting of the discrete state component, this algorithm chooses the Gaussian which is "closest" to the exact state posterior given previous beliefs. A drawback of this algorithm is that the size of the mixture is predetermined by the cardinality of the discrete state component. However, it does allow the state/measurement equations, conditional on the discrete state, to be non-Gaussian.

Gaussian-sum filter
This is a family of algorithms which propagate the exact posterior for one step, giving a large Gaussian mixture, and then reduce the mixture as needed. Methods for reducing the mixture include:
Drop mixture components with low weight
Sample mixture components according to weight
Used in Rao-Blackwellised particle filters:
Repeatedly merge most similar pair of mixture components
Minimize divergence between the large and small mixture by nonlinear optimization

## Nonparametric belief

Histogram filter
Quantize the state space and represent the belief by a histogram. The filtering equations then match a hidden Markov model.
• "A general filter for measurements with any probability distribution", Y. Rosenberg and M. Werman, CVPR'97, 654--659.
• "Non-Gaussian state-space modeling of nonstationary time series", G. Kitagawa, J Am Stat Assoc 82:1032--1063, 1987.
• "Recursive Bayesian estimation using piecewise constant approximations", Kramer and Sorenson, Automatica 24(6):789--801, 1988.

Particle filter
Represent the state posterior by a large set of samples drawn from the distribution. The particles are updated on-line to always represent the current state posterior. The most common way to do this is to pick a particle for the previous state, sample a particle for the current state using the state equation, and weight the particle by the probability of the measurement. Sample the particles according to weight to get a set of unweighted particles.

Particle filter with interpolation
A particle filter where you increase diversity by fitting a density to the particles and resampling from that density.

Particle filter with MCMC steps
A particle filter where you increase diversity by including MCMC steps.
• "Following a moving target---Bayesian inference for dynamic Bayesian models" W. Gilks and C. Berzuini, J Royal Stat Soc B 63(1):127--146, 2001.
• "Particle filters for state estimation of jump Markov linear systems"
• "Sequential importance sampling for nonparametric Bayes models: The next generation", S. N. MacEachern, M. Clyde, J. S. Liu, Canadian J of Statistics 27(2):251-267, 1999.

MCMC filtering
MCMC can be specially designed to provide efficient, bounded-memory filtering, for example via randomized Gibbs sampling.

## Batch filtering/smoothing algorithms

Kalman smoothing
Used with the Kalman filter, or any filter which linearizes the state equations, e.g. EKF, UF, LUF, ADF.

Expectation Propagation
Provides batch filtering and smoothing. Can be used with any method for linearizing the measurement equations, e.g. EKF, UF, LUF, ADF. Unlike Kalman smoothing, the measurement equations are re-linearized until a globally-stable solution is reached, giving better results.

Variational lower bounds
Provides batch filtering and smoothing. Meshes well with parameter estimation.

Two-filter smoothing
Run filtering forward and independently in reverse and combine the results. Useful for Gaussian-sum filters.
• "The two-filter formula for smoothing and an implementation of the Gaussian-sum smoother", G. Kitagawa, Annals Institute of Statistical Mathematics 46(4):605-623, 1994

Particle smoothing by sampling
Reweight and resample the particles at time t, based on a sampled particle from time t+1.

Particle smoothing by interpolation
Reweight and resample the particles at time t, based on a fitted density to the particles at time t+1.

MCMC
Gibbs sampling or Metropolis-Hastings.

Markov Random Field algorithms
Relaxation, etc.