Seungyeop Han, Matthai Philipose, and Yun-Cheng Ju
This paper presents the design and implementation of a programming system that enables third-party developers to add spoken natural language (SNL) interfaces to mobile applications. Existing systems either restrict SNL capabilities to first-party applications or limit developer-defined spoken interactions to keyphrases rather than broad natural language. An examination of expert workflow reveals that the primary challenge is in gathering comprehensive sets of paraphrases for each command and in selecting and tuning corresponding statistical models for speech and language processing. We address the former problem by integrating automated statistical machine paraphrasing and webscale crowdsourcing into the developer workflow. We address the latter by developing a classifier architecture designed to be robust across app domains. We have realized our design fully as an extension to the Visual Studio IDE. Based on a new benchmark dataset with 3500 spoken instances of 27 commands from 20 subjects and a small developer study, we establish the promise of our approach and the impact of various design choices.