Seungyeop Han, Matthai Philipose, and Yun-Cheng Ju
December 2012
This paper presents the design and implementation of a programming system that
enables third-party developers to add spoken natural language (SNL) interfaces
to mobile applications. Existing systems either restrict SNL capabilities to
first-party applications or limit developer-defined spoken interactions to
keyphrases rather than broad natural language. An examination of expert workflow
reveals that the primary challenge is in gathering comprehensive sets of
paraphrases for each command and in selecting and tuning corresponding
statistical models for speech and language processing. We address the former
problem by integrating automated statistical machine paraphrasing and webscale
crowdsourcing into the developer workflow. We address the latter by developing a
classifier architecture designed to be robust across app domains. We have
realized our design fully as an extension to the Visual Studio IDE. Based on a
new benchmark dataset with 3500 spoken instances of 27 commands from 20 subjects
and a small developer study, we establish the promise of our approach and the
impact of various design choices.
![]() PDF file |
| Type | TechReport |
| Number | MSR-TR-2012-128 |