The relation of eye gaze and face pose: Potential impact on speech recognition

  • Malcolm Slaney ,
  • Andreas Stolcke ,
  • Dilek Hakkani-Tür

Proc. International Conference on Multimodal Interaction |

Published by ACM - Association for Computing Machinery

We are interested in using context to improve speech recognition and speech understanding. Knowing what the user is attending to visually helps us predict their utterances and thus makes speech recognition easier. Eye gaze is one way to access this signal, but is often unavailable (or expensive to gather) at longer distances. In this paper we look at joint eye-gaze and facial-pose information while users perform a speech reading task. We hypothesize, and verify experimentally, that the eyes lead, and then the face follows. Face pose might not be as fast, or as accurate a signal of visual attention as eye gaze, but based on experiments correlating eye gaze with speech recognition, we conclude that face pose provides useful information to bias a recognizer toward higher accuracy.