Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Improved Topic-Dependent Language Modeling Using Information Retrieval Techniques
Improved Topic-Dependent Language Modeling Using Information Retrieval Techniques

N-gram language models are frequently used by the speech

recognition systems to constrain and guide the search. N-gram

models use only the last N-1 words to predict the next word.

Typical values of N that are used range from 2-4. N-gram

language models thus lack the long-term context information. We

show that the predictive power of the N-gram language models

can be improved by using long-term context information about the

topic of discussion. We use information retrieval techniques to

generalize the available context information for topic-dependent

language modeling. We demonstrate the effectiveness of this

technique by performing experiments on the Wall Street Journal

text corpus, which is a relatively difficult task for topic-dependent

language modeling since the text is relatively homogeneous. The

proposed method can reduce the perplexity of the baseline

language model by 37%, indicating the predictive power of the

topic-dependent language model.

1999-milindm-icassp.pdf
PDF file

In: Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing

Details

Type: Inproceedings