Yingli Tian, Liangliang Cao, Zicheng Liu, and Zhengyou Zhang
May 2012
Action recognition with cluttered and moving background
is a challenging problem. One main difficulty lies in the
fact that the motion field in an action region is contaminated by
the background motions.We propose a hierarchical filtered motion
(HFM) method to recognize actions in crowded videos by the use
of motion history image (MHI) as basic representations of motion
because of its robustness and efficiency. First, we detect interest
points as the two-dimensional Harris corners with recent motion,
e.g., locations with high intensities in the MHI. Then, a global spatial
motion smoothing filter is applied to the gradients of the MHI
to eliminate isolated unreliable or noisy motions. At each interest
point, a local motion field filter is applied to the smoothed gradients
of the MHI by computing structure proximity between any
pixel in the local region and the interest point. Thus, the motion at
a pixel is enhanced or weakened based on its structure proximity
with the interest point. To validate its effectiveness, we characterize
the spatial and temporal features by histograms of oriented
gradient in the intensity image and the MHI, respectively, and use
a Gaussian-mixture-model-based classifier for action recognition.
The performance of the proposed approach achieves the state-ofthe-
art results on the KTH dataset that has clean background.
More importantly, we perform cross-dataset action classification
and detection experiments, where the KTH dataset is used for
training, while the microsoft research (MSR) action dataset II that
consists of crowded videos with people moving in the background
is used for testing. Our experiments show that the proposed HFM
method significantly outperforms existing techniques.
In IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C
| Type | Article |