The Joint Manifold Model

Ram Navaratnam, Andrew Fitzgibbon, Roberto Cipolla



Many computer vision tasks may be expressed as the problem of learning a mapping between image space and a parameter space. For example, in human body pose estimation, recent research has directly modelled the mapping from image features (z) to joint angles (θ). Fitting such models requires training data in the form of labelled (z,θ) pairs, from which are learned the conditional densities p(θ | z). Inference is then simple: given test image features z, the conditional p(θ | z) is immediately computed. However large amounts of training data are required to fit the models, particularly in the case where the spaces are high dimensional.

We show how the use of unlabelled data&emdash;samples from the marginal distributions p(z) and p(θ)&em;may be used to improve fitting. This is valuable because it is often significantly easier to obtain unlabelled than labelled samples. We use a Gaussian process latent variable model to learn the mapping from a shared latent low-dimensional manifold to the feature and parameter spaces. This extends existing approaches to (a) use unlabelled data, and (b) represent one-to-many mappings.

Experiments on synthetic and real problems demonstrate how the use of unlabelled data improves over existing techniques. In our comparisons, we include existing approaches that are explicitly semi-supervised as well as those which implicitly make use of unlabelled examples.