Adaptive Pooling over Multiple Trajectory Attributes for Action Recognition

Advanced Video- and Signal-based Surveillance (AVSS) |

Publication

We present a new approach for feature pooling in human action recognition. Instead of partitioning videos at predefined uniform intervals in a spatial-temporal volume as done with spatial pyramid matching, our method adaptively partitions in a pooling attribute space, defined by multiple trajectory-based cues. The pooling attributes include individual spatial and temporal coordinates of a trajectory, as well as its motion saliency, curvature, and scale. To determine partitions of the attribute space in an adaptive manner, we utilize KD-trees that separate trajectories based on their distributions within the attribute space. The generated pooling volumes are jointly utilized for action recognition via SVM weights learned by Multiple Kernel Learning. Through extensive experimentation on major benchmarks, it is shown that this adaptive pooling over multiple trajectory attributes leads to significant improvements in recognition performance.