Mining Actionlet Ensemble for Action Recognition with Depth Cameras

Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan


Human action recognition is an important yet challeng- ing task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps cap- tured by the depth cameras are very noisy and the 3D posi- tions of the tracked joints may be completely wrong if seri- ous occlusions occur, which increases the intra-class vari- ations in the actions. In this paper, an actionlet ensem- ble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalign- ments, and capable of characterizing both the human mo- tion and the human-object interactions. The proposed ap- proach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and an- other dataset captured by a MoCap system. The experimen- tal evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.


Publication typeProceedings
PublisherIEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
> Publications > Mining Actionlet Ensemble for Action Recognition with Depth Cameras