Dong Yu, Alex Acero, and Li Deng
We have recently developed a long-contextual-span hidden trajectory model (HTM) which captures underlying
dynamic structure of speech coarticulation and reduction. Due to the long-span nature of the HTM and the complexity
of its likelihood score computation, N-best list rescoring was the principal paradigm for evaluating the HTM for phonetic
recognition in our earlier work. In this paper, we describe improved likelihood score computation in the HTM and a novel
A*-based time-asynchronous lattice-constrained decoding algorithm for the HTM evaluation. We focus on several special
considerations in the decoder design, which are necessitated by the dependency of the HTM score at each given frame on
the model parameters associated with a variable number of adjacent past and future phones. We present details on how the
nodes and links in the lattices are expanded via a look-ahead mechanism, on how the A* heuristics are estimated, and on
how pruning strategies are applied to speed up the search process. The experiments on the standard TIMIT phonetic
recognition task show improvement of recognition accuracy by the new search algorithm on recognition lattices over
the traditional N-best rescoring paradigm.
|Published in||Speech Communication|
Copyright © 2007 Elsevier B.V. All rights reserved.