The relation of eye gaze and face pose: Potential impact on speech recognition
- Malcolm Slaney ,
- Andreas Stolcke ,
- Dilek Hakkani-Tür
Proc. International Conference on Multimodal Interaction |
Published by ACM - Association for Computing Machinery
We are interested in using context to improve speech recognition and speech understanding. Knowing what the user is attending to visually helps us predict their utterances and thus makes speech recognition easier. Eye gaze is one way to access this signal, but is often unavailable (or expensive to gather) at longer distances. In this paper we look at joint eye-gaze and facial-pose information while users perform a speech reading task. We hypothesize, and verify experimentally, that the eyes lead, and then the face follows. Face pose might not be as fast, or as accurate a signal of visual attention as eye gaze, but based on experiments correlating eye gaze with speech recognition, we conclude that face pose provides useful information to bias a recognizer toward higher accuracy.
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.