Learning Appearance and Shape Epitomes

We can use the epitomic representation for object appearances and shapes in multiple image layers. This results in a generative model that can composite appearance and shape epitomes to provide a description of the image as a combination of sprites, as in video examples in Jojic and Frey, CVPR 2001. However, due to the compression abilities of epitome, it is possible to discover layer structure in a single image.

Figure 1. Illustrates the generative model for two layer description. See also the video illustration of the learning process.


In addition to segmenting the image, the model can also fill in occluded regions with similar appearance. This is because the model learns the continuity in the texture and uses this to explain the occluded regions in the patches.

Comparison to Normalized Cuts (Shi & Malik) & Mean Shift (Comaniciu & Meer, 2002) :

Here, we make a comparative study of our approach to segmentation using epitome, with two popular methods, normalized cuts and mean shift.

Both these methods have large number of parameters to tune. In contrast,  our approach has only one parameter which  is the size of the epitome.   To compare segmentation performance with our approach, we have presented results for various  parameter settings.  We will post a quantitative study soon, as well.

Segmentation Results


Using epitome in a layered generative model provides several advantages in segmentation. The model implicitly uses breaks in texture as a cue for segmentation, and tends to refine the segmentation by continuing the occluded texture to enhance the mask estimate.

However, the basic epitomic representation can be used virtually in any other algorithm. For example, using the software provided here, we can compute the epitome of the image on which the mean shift algorithm failed and then color the image based on the most likely coordinate of the corresponding pixel in the epitome.  In other words, the epitome is replaced by the color-coded image and the input image is recomputed. The color code can be generated by the following piece of code:

% Color-code image of an epitome size, where colors correspond to the position

function C=positionColorCode(N);


for i=1:N
  for j=1:N


The preprocessing is done by, for example,


( The software is available from www.research.microsoft.com/~jojic.epitome.htm )

This newly obtained image (shown below) is a much better input to the mean shift algorithm, as the difference in color correspond to the city block distance in the epitome that captures the image texture.