However, by treating the sequence of images as a set of samples from a multidimensional stochastic time-series, we can learn a stochastic model---such as an AR model (Papoulis 1991, Shumway 2000)---of the random process which generated the sequence of images. With a static camera, this stochastic model can be used to extend the sequence arbitrarily in time: driving the model with random noise results in an infinitely varying sequence of images which always looks like the short input sequence. In this way, we can create ``videotextures'' (Szummer and Picard 1995, Schodl et al 2000) which can play forever without repetition.
With a moving camera, the image generation process comprises two components---a stochastic component generated by the videotexture, and a parametric component due to the camera motion. For example, a camera rotation induces a relationship between successive images which is modelled by a 4-point perspective transformation, or homography. Human observers can easily separate the camera motion from the stochastic element.
The key observation for an automatic implementation is that without image registration, the time-series analysis must work harder to model the combined stochastic and parametric image generation. Specifically, the learned model will require more components, or more coefficients, to achieve the same expressive power as for the static scene. With the correct registration the model will be more compact. Therefore, by searching for the registration parameters which result in the most parsimonious stochastic model, we can register sequences where there is only stochastic rigidity. The paper describes an implementation of this scheme and shows results on a number of example sequences.
When fitting the AR model to small amounts of data, you'll need priors on
the coefficients of the matrices A_i
. Examples are that they
are diagonal, tridiagonal, or heavily diagonal dominant. The diagonal case
is easiest -- just use Yule-Walker on each channel independently. Because
the PCA has already orthogonalized the data, it often works just fine.
1. Flowers
Input sequence | Stabilized sequence (2.2M) |
2. Water
Input sequence (29.8M) | Augmented sequence (870K) |
3. More Water
Input sequence (670K) | Videotexture (2.9M) |