Larry Heck, Dilek Hakkani-Tur, and Gokhan Tur
The past decade has seen the emergence of web-scale structured and linked semantic knowledge resources (e.g., Freebase, DBPedia). These semantic knowledge graphs provide a scalable “schema for the web”, representing a significant opportunity for the spoken language understanding (SLU) research community. This paper leverages these resources to bootstrap a web-scale semantic parser with no requirement for semantic schema design, no data collection, and no manual annotations. Our approach is based on an iterative graph crawl algorithm. From an initial seed node (entity-type), the method learns the related entity-types from the graph structure, and automatically annotates documents that can be linked to the node (e.g., Wikipedia articles, web search documents). Following the branches, the graph is crawled and the procedure is repeated. The resulting collection of annotated documents is used to bootstrap web-scale conditional random field (CRF) semantic parsers. Finally, we use a maximum-a-posteriori (MAP) unsupervised adaptation technique on sample data from a specific domain to refine the parsers. The scale of the unsupervised parsers is on the order of thousands of domains and entity-types, millions of entities, and hundreds of millions of relations. The precision-recall of the semantic parsers trained with our unsupervised method approaches those trained with supervised annotations.
|Published in||Proceedings of Interspeech|
|Publisher||International Speech Communication Association|