Liangliang Cao, Zicheng Liu, and Thomas Huang
13 June 2010
In recent years, many research works have been carried
out to recognize human actions from video clips. To
learn an effective action classifier, most of the previous approaches
rely on enough training labels. When being required
to recognize the action in a different dataset, these
approaches have to re-train the model using new labels.
However, labeling video sequences is a very tedious and
time-consuming task, especially when detailed spatial locations
and time durations are required. In this paper, we
propose an adaptive action detection approach which reduces
the requirement of training labels and is able to handle
the task of cross-dataset action detection with few or no
extra training labels. Our approach combines model adaptation
and action detection into a Maximum a Posterior
(MAP) estimation framework, which explores the spatialtemporal
coherence of actions and makes good use of the
prior information which can be obtained without supervision.
Our approach obtains state-of-the-art results on KTH
action dataset using only 50% of the training labels in tradition
approaches. Furthermore, we show that our approach
is effective for the cross-dataset detection which adapts the
model trained on KTH to two other challenging datasets
![]() PDF file |
Publisher IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
© 2012 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
| Type | Proceedings |