Gang Yu, Junsong Yuan, and Zicheng Liu
28 November 2011
Many existing techniques in content based video retrieval treat a video sequence as a whole to match it against a query video or to assign a text label. Such an approach has serious limitations when applied to human action retrieval because an action may occur only in a sub-region and last for a small portion of the video length. In situations like this, we essen- tially need to match the subvolumes of the video sequences against the query video. A naive exhaustive search is im- practical due to large number of possible subvolumes for each video sequence. In this paper, we propose a novel framework for action retrieval which performs pattern matching at sub- volume level and is very efficient in handling large corpus of videos. We construct an unsupervised random forest to in- dex the video database, generate a score volume with Hough voting and then employ a max sub-path strategy to quickly search for the temporal and spatial positions of all the video sequences in the database. We present action search experi- ments on challenging datasets to validate the efficiency and effectiveness of our system.
|Published in||Multimeda (ACMMM)|
© 2012 ACM. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ACM.