Hierarchical Filtered Motion for Action Recognition in Crowded Videos

Yingli Tian, Liangliang Cao, Zicheng Liu, and Zhengyou Zhang

Abstract

Action recognition with cluttered and moving background

is a challenging problem. One main difficulty lies in the

fact that the motion field in an action region is contaminated by

the background motions.We propose a hierarchical filtered motion

(HFM) method to recognize actions in crowded videos by the use

of motion history image (MHI) as basic representations of motion

because of its robustness and efficiency. First, we detect interest

points as the two-dimensional Harris corners with recent motion,

e.g., locations with high intensities in the MHI. Then, a global spatial

motion smoothing filter is applied to the gradients of the MHI

to eliminate isolated unreliable or noisy motions. At each interest

point, a local motion field filter is applied to the smoothed gradients

of the MHI by computing structure proximity between any

pixel in the local region and the interest point. Thus, the motion at

a pixel is enhanced or weakened based on its structure proximity

with the interest point. To validate its effectiveness, we characterize

the spatial and temporal features by histograms of oriented

gradient in the intensity image and the MHI, respectively, and use

a Gaussian-mixture-model-based classifier for action recognition.

The performance of the proposed approach achieves the state-ofthe-

art results on the KTH dataset that has clean background.

More importantly, we perform cross-dataset action classification

and detection experiments, where the KTH dataset is used for

training, while the microsoft research (MSR) action dataset II that

consists of crowded videos with people moving in the background

is used for testing. Our experiments show that the proposed HFM

method significantly outperforms existing techniques.

Details

Publication typeArticle
Published inIEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C
> Publications > Hierarchical Filtered Motion for Action Recognition in Crowded Videos