Murat Akbacak, Dilek Hakkani-Tur, and Gokhan Tur
For domain-specific speech recognition tasks, it is best if the statistical language model component is trained with text data that is content-wise and style-wise similar to the targeted domain for which the application is built. For state-of-the-art language modeling techniques that can be used in real-time within speech recognition engines during first-pass decoding (e.g., N-gram models), the above constraints have to be fulfilled in the training data. However collecting such data, even through crowd sourcing, is expensive and time consuming, and can still be not representative of how a much larger user population would interact with the recognition system. In this paper, we address this problem by employing several semantic web sources that already contain the domain-specific knowledge, such as query click logs and knowledge graphs. We build statistical language models that meet the requirements listed above for domain-specific recognition tasks where natural language is used and the user queries are about name entities in a specific domain. As a case study, in the movies domain where users’ voice queries are movie related, compared to a generic web language model, a language model trained with the above resources not only yields significant perplexity and word-error-rate improvements, but also presents an approach where such language models can be rapidly developed for other domains.
|Published in||Proceedings of Interspeech|
|Publisher||ISCA - International Speech Communication Association|