Variational Approximation of Long-Span Language Models for LVCSR

Long-span language models that capture syntax and semantics are seldom used in the first

pass of large vocabulary continuous speech recognition systems due to the prohibitive

search-space of sentencehypotheses. Instead, an N-best list of hypotheses is created

using tractable n-gram models, and rescored using the long-span models. It is shown in

this paper that computationally tractable variational approximations of the long-span

models are a better choice than standard n-gram models for first pass decoding. They not

only result in a better first pass output, but also produce a lattice with a lower oracle word

error rate, and rescoring the N-best list from such lattices with the long-span models

requires a smaller N to attain the same accuracy. Empirical results on the WSJ, MIT

Lectures, NIST 2007 Meeting Recognition and NIST 2001 Conversational Telephone

Recognition data sets are presented to support these claims.

ICASSP-2011.pdf
PDF file

Publisher  IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Details

TypeInproceedings
> Publications > Variational Approximation of Long-Span Language Models for LVCSR