Florian Schroff, C. Lawrence Zitnick, and Simon Baker
We propose an algorithm to cluster video shots by the location in which they were captured. Each shot is represented as a set of keyframes and each keyframe is represented by a histogram of textons. Clustering is performed using an energy-based formulation. We propose an energy function for the clusters that matches the expected distribution of viewpoints in any one location and use the chi-squared distance to measure the similarity of two shots. We also add a temporal prior to model the fact that temporally neighboring shots are more likely to have been captured in the same location. We test our algorithm on both home videos and professionally edited footage (sitcoms). Quantitative results are presented to justify each choice made in the design of our algorithm, as well as comparisons with k-means, connected components, and spectral clustering.
In Proceedings of the British Machine Vision Conference
© 2008 Microsoft Corporation. All rights reserved.