A Speech-Centric Perspective for Human-Computer Interface

  • Li Deng ,
  • Alex Acero ,
  • Ye-Yi Wang ,
  • Kuansan Wang ,
  • Hsiao-Wuen Hon ,
  • Jasha Droppo ,
  • Milind Mahajan ,
  • Xuedong Huang

Proc. of the IEEE Fifth Workshop on Multimedia Signal Processing |

Published by Institute of Electrical and Electronics Engineers, Inc.

Speech technology has been playing a central role in enhancing human-machine interactions, especially for small devices for which CUI has obvious limitations. The speechcentric perspective for hnman-compnter interface advanced in this paper derives from the view that speech is the only natural and expressive modality to enable people to access information from and to interact with any device. In this paper, we describe the work conducted at Microsoft Research, in the project codenamed &. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present MiF’ad as the first Dr. Who’s application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fuUy integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today’s PDAs or smart phones.