Locus: Learning Object Classes with Unsupervised Segmentation

J. Winn and  N. Jojic


Project description

LOCUS addresses the problem of learning a model of an object class (e.g. horses, cars) from a 'bucket' of images each  containing an object in that class.  LOCUS does not require any human annotation of the images - it discovers the location and pose of the object in each image and also gives a segmentation of each object.  The motivation is that by avoiding the need for human labelling, we can quickly scale up to the large number of object classes required for a practical object recognition system.

For example, given a bucket of 20 images of horses, LOCUS learns a class shape model and infers segmentations as shown below for four of the 20 images:
Figure 1: The class model and example segmentations when LOCUS is applied to 20 images of horses.

The accuracy of LOCUS's automatic segmentations rivals that of state-of-the-art methods which require hand-segmented training data (see Section 4 of the paper for details).

How it works

LOCUS uses a hierarchical generative model of the set of images, as shown in the diagram below. Class specific information is contained in the class mask and edge probability models, which govern the broad shape and typical edge locations for instances of that class in a neutral position.  The position, size, deformation and the appearance of individual object instances are allowed to vary from image to image. During inference, we learn the class shape and edge models simultaneously with learning the variables associated with each image.

 

The key to LOCUS's ability to cope with varying object appearance and illumination is the use of a separate appearance model for the object and background in each image.  Hence, the object is defined primarily by its shape and edge map, and by the self-similarity of its appearance (color or texture) within a single image, instead of a strong global appearance model.

 

Figure 2: The Bayesian network used in LOCUS as a generative model of a set of images.  The images indicate the inferred state of each variable given a set of 20 face images.

Object registration

LOCUS learns a dense registration from each object instance to the class model.  Hence, we can illustrate the accuracy of the automatic registration/segmentation by showing each instance 'morphing' into the next by interpolating between the deformation field and appearance of the two objects.  The three videos below were created automatically from three image sets using exactly the same algorithm and parameter settings.
Car side morph
[Download
AVI WMV]
Horse morph
[Download
AVI WMV]
Car rear morph
[Download
AVI WMV]
Use download links if embedded videos do not play.
Figure 3: Videos showing the automatic registration and segmentation of objects from three image sets: cars (side), horses, cars (rear).

Learning objects parts

We can extend the LOCUS model so that, rather than being binary, the masks are multi-valued.  This allows us to learn which parts of an object are self-similar.  The class shape model then becomes a deformable probabilistic index map (PIM).

Figure 4: Deformable probabilistic index maps (top image in each group) along with label images for three example images for each of four object classes: cars, horses, faces, planes.  In many cases, the learned parts are semantically meaningful e.g. car window, wheel, hair, eyes, wings.

 

 Scientific publications             
  1. J. Winn and N. Joijic. LOCUS: Learning Object Classes with Unsupervised Segmentation. In Proc. IEEE Intl. Conf. on Computer Vision (ICCV), Beijing, China, 2005. 

back to Computer Vision @ MSRC

Web site designed and maintained by A. Criminisi