Dilek Hakkani-Tur, Gokhan Tur, Larry Heck, and Elizabeth Shriberg
Domain detection in spoken dialog systems is usually treated as a multi-class, multi-label classification problem, and training of domain classifiers requires collection and manual annotation of example utterances. In order to extend a dialog system to new domains in a way that is seamless for users, domain detection should be able to handle utterances from the new domain as soon as it is introduced. In this work, we propose using web search query logs, which include queries entered by users and the links they subsequently click on, to bootstrap domain detection for new domains. While sampling user queries from the query click logs to train new domain classifiers, we introduce two types of measures based on the behavior of the users who entered a query and the form of the query. We show that both types of measures result in reductions in the error rate as compared to randomly sampling training queries. In controlled experiments over five domains, we achieve the best gain from the combination of the two types of sampling criteria.