Gokhan Tur, Anoop Deoras, and Dilek Hakkani-Tur
Conversational understanding systems, especially virtual personal assistants (VPAs), perform “targeted” natural language understanding, assuming their users stay within the walled gardens of covered domains, and back-off to generic web search otherwise. However, users usually do not know the concept of domains and sometimes simply do not distinguish the system from simple voice search. Hence it becomes an important problem to identify these rejected out-of-domain utterances which are actually intended for the VPA. This paper presents a study tackling this new task, showing that how one utters a request is more important for this task than what is uttered, resembling addressee detection or dialog act tagging. To this end, syntactic and semantic parse “structure” features are extracted in addition to lexical features to train a binary SVM classifier using a large number of random web search queries and VPA utterances from multiple domains. We present controlled experiments leaving one domain out and check the precision of the model when combined with unseen queries. Our results indicate that such structured features result in higher precision especially when the test domain bears little resemblance to the existing domains.
|Published in||Proceedings of Interspeech|
|Publisher||ISCA - International Speech Communication Association|