Simon Fothergill, Helena M. Mentis, Sebastian Nowozin, and Pushmeet Kohli
Entertainment and gaming systems such as theWii and XBox Kinect have brought touchless, body-movement based interfaces to the masses. Systems like these enable the estimation of movements of various body parts from raw inertial motion or depth sensor data. However, the interface developer is still left with the challenging task of assigning meaning to these body movements. The machine learning (ML) approach for tackling this problem requires collection of data sets which contains the relevant body movements and their associated semantic labels in language. These data sets directly impact the accuracy and performance of the gesture recognition system, and should ideally contain all natural variations of the movements associated with a gesture. This paper addresses the problem of collecting such gesture datasets. In particular, we investigate the question of what is the most appropriate instructions’ semiotic modality for conveying to human subjects the movements the system developer needs them to perform. The results of our qualitative and quantitative analysis indicate that the choice of instructions’ semiotic modality has a significant impact on the performance of the learnt gesture recognition system. We are hopeful that our findings on the problem of collecting high quality movement data from human subjects would help developers to build better gesture recognition systems in the future.
The data set is available through a separate web page:
Publisher ACM Conference on Computer-Human Interaction