J. Ma and Li Deng
November 2003
In this paper, we present two efficient strategies for
likelihood computation and decoding in a continuous speech recognizer
using an underlying nonlinear state-space dynamic model
for the hidden speech dynamics. The state-space model has been
specially constructed so as to be suitable for the conversational or
casual style of speech where phonetic reduction abounds. Two specific
decoding algorithms, based on optimal state-sequence estimation
for the nonlinear state-space model, are derived, implemented,
and evaluated. They successfully overcome the exponential growth
in the original search paths by using the path-merging approaches
derived from Bayes’ rule. We have tested and compared the two
algorithms using the speech data from the Switchboard corpus,
confirming their effectiveness. Conversational speech recognition
experiments using the Switchboard corpus further demonstrated
that the use of the new decoding strategies is capable of reducing
the recognizer’s word error rate compared with two baseline recognizers,
including theHMMsystem and the nonlinear state-space
model using the HMM-produced phonetic boundaries, under identical
test conditions.
![]() PDF file |
In IEEE Trans. on Speech and Audio Processing
| Type | Article |
| Pages | 590-602 |
| Volume | 11 |
| Number | 6 |