Mining Actionlet Ensemble for Action Recognition with Depth Cameras

Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan


Human action recognition is an important yet challeng-

ing task. The recently developed commodity depth sensors

open up new possibilities of dealing with this problem but

also present some unique challenges. The depth maps cap-

tured by the depth cameras are very noisy and the 3D posi-

tions of the tracked joints may be completely wrong if seri-

ous occlusions occur, which increases the intra-class vari-

ations in the actions. In this paper, an actionlet ensem-

ble model is learnt to represent each action and to capture

the intra-class variance. In addition, novel features that

are suitable for depth data are proposed. They are robust

to noise, invariant to translational and temporal misalign-

ments, and capable of characterizing both the human mo-

tion and the human-object interactions. The proposed ap-

proach is evaluated on two challenging action recognition

datasets captured by commodity depth cameras, and an-

other dataset captured by a MoCap system. The experimen-

tal evaluations show that the proposed approach achieves

superior performance to the state of the art algorithms.


Publication typeProceedings
PublisherIEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
> Publications > Mining Actionlet Ensemble for Action Recognition with Depth Cameras