Microsoft Research
Computational User Experiences

A New Speech Interaction Model for Dictation
on Touchscreen Devices

We are exploring a multi-modal interface that helps people type with speech by providing them with real-time feedback while they are speaking (phrases as a chuck) and by allowing them to correct any mistakes with both speech and touch.


Project Team

Voice Typing


Dictation using speech recognition could potentially serve as an efficient input method for touchscreen devices. However, dictation systems today follow a mentally disruptive speech interaction model: users must first formulate utterances and then produce them, as they would with a voice recorder. Because utterances do not get transcribed until users have finished speaking, the entire output appears and users must break their train of thought to verify and correct it. In this paper, we introduce Voice Typing, a new speech interaction model where users’ utterances are transcribed as they produce them to enable real-time error identification. For fast correction, users leverage a marking menu using touch gestures. Voice Typing aspires to create an experience akin to having a secretary type for you, while you monitor and correct the text. In a user study where participants composed emails using both Voice Typing and traditional dictation, they not only reported lower cognitive demand for Voice Typing but also exhibited 29% relative reduction of user corrections. Overall, they also preferred Voice Typing.



Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices

Anuj Kumar, Tim Paek, Bongshin Lee

Proceedings of ACM CHI 2012, May 2012

Contact Us Terms of Use Trademarks Privacy Statement ©2010 Microsoft Corporation. All rights reserved.Microsoft