Mining Actionlet Ensemble for Action Recognition with Depth Cameras

Human action recognition is an important yet challeng-

ing task. The recently developed commodity depth sensors

open up new possibilities of dealing with this problem but

also present some unique challenges. The depth maps cap-

tured by the depth cameras are very noisy and the 3D posi-

tions of the tracked joints may be completely wrong if seri-

ous occlusions occur, which increases the intra-class vari-

ations in the actions. In this paper, an actionlet ensem-

ble model is learnt to represent each action and to capture

the intra-class variance. In addition, novel features that

are suitable for depth data are proposed. They are robust

to noise, invariant to translational and temporal misalign-

ments, and capable of characterizing both the human mo-

tion and the human-object interactions. The proposed ap-

proach is evaluated on two challenging action recognition

datasets captured by commodity depth cameras, and an-

other dataset captured by a MoCap system. The experimen-

tal evaluations show that the proposed approach achieves

superior performance to the state of the art algorithms.

PDF file

Publisher  IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
© 2012 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.


> Publications > Mining Actionlet Ensemble for Action Recognition with Depth Cameras