N.Jojic, J. Winn, and L.
Zitnick
The embedded videos below may take a long time to load. Instead, you can click on the links beneath to the videos to download them.
![]() |
| Segmentation boundaries defined by g [download video 3.8MB] |
The video above shows the segmentation boundaries in the posterior on the fine segmentation described by the variables g in the paper. The segmentation tends to be consistent for at least several frames due to local constraints in the model (equations 12-16), but not consistent enough to lead directly to foreground-background segmentation. The matching variables h describe motion tracks consisting of matched segments. About 30% of these tracks are longer than 20 frames.
![]() |
| Foreground-background segmentation [download video 660K] |
This second video shows the extracted foreground object in color and the background in black and white. The result is based on the hierarchical model switching whose details are given in Section 4. The model allows switching between the over-segmented representation above and the more global shape model used in flexible sprites (equations 5 and 12). The over-segmentation helps carry the foreground-background segmentation (mask) through frames of video, even when the global appearance and shape models have problems explaining some of the frames. In addition to using these two major ways of explaining the data (segment tracking and global shape/appearance model), virtually every variable in the switching model has several ways of being generated. The mask defining the shape, for example, is allowed to switch between an MRF model, a global blob shape model, a global per-pixel model, a model conditioned on the previous frame, and a model conditioned on the segmentation variables g (illustrated in the previous video). In this case, in addition to avoiding local minima, as described in Section 3, the hierarchical switching makes the overall model more expressive, as no single configuration of switch variables, if fixed for all frames, would explain the data well.
Note that this video is hard for automatic analysis because of the changes in appearance due to shirt folding and changing angle of illumination, on top of rigid and non-rigid body motion. In addition, the background is complex both in terms of its context and motion.