Dong Yu, Li Deng, and Alex Acero
We have recently developed a long-contextual-span hidden trajectory model (HTM) which captures underlying dynamic structure of speech coarticulation and reduction. Due to the long-span nature of the HTM and the complexity of its likelihood score computation, N-best list rescoring was the principal paradigm for evaluating the HTM for phonetic recognition in our earlier work. In this paper, we describe improved likelihood score computation in the HTM and a novel A*-based time-asynchronous lattice-constrained decoding algorithm for the HTM evaluation. We focus on several special considerations in the decoder design, which are necessitated by the dependency of the HTM score at each given frame on the model parameters associated with a variable number of adjacent past and future phones. We present details on how the nodes and links in the lattices are expanded via a look-ahead mechanism, on how the A* heuristics are estimated, and on how pruning strategies are applied to speed up the search process. The experiments on the standard TIMIT phonetic recognition task show improvement of recognition accuracy by the new search algorithm on recognition lattices over the traditional N-best rescoring paradigm.
Keywords: A* search over recognition lattices; Decoder; Phonetic recognition; Vocal tract resonances; Speech dynamics; Hidden trajectories; Contextual assimilation; Filtering of targets; TIMIT; Long-span context dependence; Lattice rescoring; Pruning; Speech recognition
|Published in||Speech Communication|
Copyright © 2007 Elsevier B.V. All rights reserved.