MIPAD: A Multimodal Interactive Prototype

  • Xuedong Huang ,
  • Alex Acero ,
  • C. Chelba ,
  • Li Deng ,
  • Jasha Droppo ,
  • D. Duchene ,
  • J. Goodman ,
  • Hsiao-Wuen Hon ,
  • D. Jacoby ,
  • L. Jiang ,
  • ,
  • Milind Mahajan ,
  • P. Mau ,
  • S. Meredith ,
  • S. Mughal ,
  • S. Neto ,
  • M. Plumpe ,
  • K. Steury ,
  • Gina Venolia ,
  • Kuansan Wang ,
  • Ye-Yi Wang

International Conference on Acoustics, Speech, and Signal Processing |

Published by Institute of Electrical and Electronics Engineers, Inc.

Dr. Who is a Microsoft’s research project aiming at creating a speech-centric multimodal interaction framework, which serves as the foundation for the .NET natural user interface. MiPad is the application prototype that demonstrates compelling user advantages for ireless Personal Digital Assistant (PDA) devices, MiPad fully integrates continuous speech recognition (CSR) and spoken la nguage understanding (SLU) to enable users to accomplish many common tasks using a multimodal interface and wireless technologies. It tries to solve the problem of pecking with tiny styluses or typing on minuscule keyboards in today’s PDAs. Unlike a cellular phone, MiPad avoids speech-only interaction. It incorporates a built-in microphone that activates whenever a field is selected. As a user taps the screen or uses a built-in roller to navigate, the tapping action narrows the number of possible instructions for spoken understanding. MiPad currently runs on a Windows CE Pocket PC with a Windows 2000 machine where speech recognition is performed. The Dr Who CSR ngine uses a unified CFG and n -gram language model. The Dr Who SLU engine is based on a robust char t parser and a plan-based dialog manager. This paper discusses MiPad’s design, implementation work in progress, and preliminary user study in comparison to the existing pen-based PDA interface.