Cross-dataset Action Detection

Liangliang Cao, Zicheng Liu, and Thomas Huang


In recent years, many research works have been carried

out to recognize human actions from video clips. To

learn an effective action classifier, most of the previous approaches

rely on enough training labels. When being required

to recognize the action in a different dataset, these

approaches have to re-train the model using new labels.

However, labeling video sequences is a very tedious and

time-consuming task, especially when detailed spatial locations

and time durations are required. In this paper, we

propose an adaptive action detection approach which reduces

the requirement of training labels and is able to handle

the task of cross-dataset action detection with few or no

extra training labels. Our approach combines model adaptation

and action detection into a Maximum a Posterior

(MAP) estimation framework, which explores the spatialtemporal

coherence of actions and makes good use of the

prior information which can be obtained without supervision.

Our approach obtains state-of-the-art results on KTH

action dataset using only 50% of the training labels in tradition

approaches. Furthermore, we show that our approach

is effective for the cross-dataset detection which adapts the

model trained on KTH to two other challenging datasets


Publication typeProceedings
PublisherIEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
> Publications > Cross-dataset Action Detection