Search and Decoding Strategies for Complex Lexical Modeling in LVCSR

Anoop Deoras

Abstract

The language model (LM) in most state-of-the-art large vocabulary continuous speech

recognition (LVCSR) systems is still the n-gram. A major reason for using such simple

LMs, besides the ease of estimating them from text, is computational complexity.

It is also true, however, that long-span LMs, be they due to a higher n-gram order,

or because they take syntactic, semantic, discourse and other long-distance dependencies

into account, are much more accurate than low-order n-grams. The standard practice is

to carry out a first pass of decoding using, say, a 3-gram LM to generate a lattice, and to

rescore only the hypotheses in the lattice with a higher order LM. But even the search space

defined by a lattice is intractable for many long-span LMs. In such cases, only the N-best

full-utterance hypotheses from the lattice are extracted for evaluation. However, the N-

best lists so produced, tend to be “baised” towards the model producing them, making the

re-scoring sub-optimal, especially if the re-scoring model is complementary to the initial

n-gram model. For this reason, we seek ways to incorporate information from long-span

LMs by searching in a more unbiased search space.

In this thesis, we first present strategies to combine many complex long and short span language models to form a much superior unified model of language. We then show how

this unified model of language can be incorporated for re-scoring dense word graphs, using

a novel search technique, thus alleviating the necessity of sub-optimal N-best list rescoring.

We also present an approach based on the idea of variational inference, virtue of

which, long-span models are efficiently approximated by some tractable but faithful models,

allowing for the incorporation of long distance information directly into the first-pass

decoding.

We have validated the methods proposed in this thesis on many standard and competitive

speech recognition tasks, sometimes outperforming state-of-the-art results. We hope

that these methods will be useful for research with long span language models not only in

speech recognition but also in other areas of natural language processing such as machine

translation, where even there the decoding is limited to n-gram language models.

Details

Publication typePhdThesis
Publisher
> Publications > Search and Decoding Strategies for Complex Lexical Modeling in LVCSR