This is a data set used for human action-detection experiments. It consists of a number of video sequences we have recorded.
Note: By installing, copying, or otherwise using this software, you agree to be bound by the terms of its license. Read the license.
MSR Action dataset contains 16 video sequences and has in total 63 actions: 14 hand clapping, 24 hand waving, and 25 boxing, performed by 10 subjects. Each sequence contains multiple types of actions. Some sequences contain actions performed by different people. There are both indoor and outdoor scenes. All of the video sequences are captured with clutter and moving backgrounds. Each video is of low resolution 320 x 240 and frame rate 15 frames per second. Their lengths are between 32 to 76 seconds. To evaluate the performance, we manually label a spatio-temporal bounding box for each action. The ground truth labeling can be found in the groundtruth.txt file. The ground truth format of each labeled action is "X width Y height T length".
See the project page for more information including sample images.
If you use this dataset, please cite the following paper:
Junsong Yuan, Zicheng Liu and Ying Wu, Discriminative Subvolume Search for Efficient Action Detection, IEEE Conf. on Computer Vision and Pattern Recognition, 2009.