Li Deng, Xiaolong Li, Dong Yu, and Alex Acero
A long-contextual-span Hidden Trajectory Model (HTM) developed recently captures underlying dynamic structure of speech coarticulation and reduction using a highly compact set of context-independent parameters. However, the longspan nature of the HTM makes it difficult to develop efficient search algorithms for its full evaluation. In this paper, we describe our initial effort in meeting this challenge. The basic search algorithm is time-asynchronous A*. Given the structural complexity of the long-span HTM, special considerations are needed to take into account the fact that the HTM score for each frame depends on the model parameters associated with a variable number of adjacent phones. Specifically, we present details on how the nodes and links in the lattices are expanded via look-ahead, how the A* heuristics are estimated, and what pruning strategies are applied to speed up the search. The experiments on TIMIT phonetic recognition show the capability of our newly developed lattice search algorithm in evaluating billions of hypotheses based on long-span HTM scores. The results significantly extend our earlier work from N-best rescoring to A* search over lattices.
|Published in||Proc. of the Interspeech Conference|
|Publisher||International Speech Communication Association|
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.