Xiao Li, Y.-C. Ju, Geoffrey Zweig, and Alex Acero
This paper presents a novel approach to language modeling for voice search based on the idea and method of statistical machine translation. We propose an n-gram based translation model that can be used for listing-to-query translation. We then leverage the query forms translated from listings to improve language modeling. The translation model is trained in an unsupervised manner using a set of transcribed voice search queries. Experiments show that the translation approach yielded drastic perplexity reductions compared with a baseline language model where no translation is applied.