Stochastic rigidity: Image registration for nowhere-static scenes
Andrew W. Fitzgibbon
Proceedings, International Conference on Computer Vision (ICCV) 2001,
Vancouver, Canada. Volume 1, Pages 662-670.

PDF   Gzipped PostScript


This paper looks at the registration of sequences of images where the observed scene is entirely non-rigid; for example a camera flying over water, a panning shot of a field of sunflowers in the wind, or footage of a crowd applauding at a sports event. In these cases, it is not possible to impose the constraint that world points have similar colour in successive views, so existing registration techniques (Glazer et al 1983, Black and Anandan 1993, Irani et al 1994, Capel and Zisserman 1998) cannot be applied. Indeed the relationship between a point's colours in successive frames is essentially a random process.

However, by treating the sequence of images as a set of samples from a multidimensional stochastic time-series, we can learn a stochastic model---such as an AR model (Papoulis 1991, Shumway 2000)---of the random process which generated the sequence of images. With a static camera, this stochastic model can be used to extend the sequence arbitrarily in time: driving the model with random noise results in an infinitely varying sequence of images which always looks like the short input sequence. In this way, we can create ``videotextures'' (Szummer and Picard 1995, Schodl et al 2000) which can play forever without repetition.

With a moving camera, the image generation process comprises two components---a stochastic component generated by the videotexture, and a parametric component due to the camera motion. For example, a camera rotation induces a relationship between successive images which is modelled by a 4-point perspective transformation, or homography. Human observers can easily separate the camera motion from the stochastic element.

The key observation for an automatic implementation is that without image registration, the time-series analysis must work harder to model the combined stochastic and parametric image generation. Specifically, the learned model will require more components, or more coefficients, to achieve the same expressive power as for the static scene. With the correct registration the model will be more compact. Therefore, by searching for the registration parameters which result in the most parsimonious stochastic model, we can register sequences where there is only stochastic rigidity. The paper describes an implementation of this scheme and shows results on a number of example sequences.


When fitting the AR model to small amounts of data, you'll need priors on the coefficients of the matrices A_i. Examples are that they are diagonal, tridiagonal, or heavily diagonal dominant. The diagonal case is easiest -- just use Yule-Walker on each channel independently. Because the PCA has already orthogonalized the data, it often works just fine.

Image sequences:

Some image sequences from the paper are available from here:

1. Flowers

Input sequence Stabilized sequence (2.2M)

2. Water

Input sequence (29.8M) Augmented sequence (870K)

3. More Water

Input sequence (670K) Videotexture (2.9M)