H. Jiang and Li Deng
In the paper, we propose a robust training strategy to deal with extraneous
acoustic variations for conversational speech recognition.
This strategy generalizes speaker adaptive training, where HMM
parameter transformations are used to normalize the extraneous
variations in the training data according to a set of pre-defined conditions.
Then a compact model and the associated prior p.d.f.’s of
transformation parameters are estimated using the maximum likelihood
criterion. In the testing phase, the compact model and the
prior p.d.f.’s are used to search for the unknown word sequence
based on Bayesian Prediction Classification. The proposed strategy
is evaluated in a Switchboard task to deal with pronunciation
variations in spontaneous speech recognition. Preliminary results
show moderate word error rate reduction over a well-trained baseline
system under identical experimental conditions.
|Published in||Proc. of the Int. Conf. on Spoken Language Processing|