A study on hidden Markov model's generalization capability for speech recognition

Xiong Xiao, Jinyu Li, and et. al


From statistical learning theory, the generalization

capability of a model is the ability to generalize well on unseen

test data which follow the same distribution as the training

data. This paper investigates how generalization capability can

also improve robustness when testing and training data are

from different distributions in the context of speech recognition.

Two discriminative training (DT) methods are used to train

the hidden Markov model (HMM) for better generalization

capability, namely the minimum classification error (MCE) and

the soft-margin estimation (SME) methods. Results on Aurora-2

task show that both SME and MCE are effective in improving one

of the measures of acoustic model’s generalization capability, i.e.

the margin of the model, with SME be moderately more effective.

In addition, the better generalization capability translates into

better robustness of speech recognition performance, even when

there is significant mismatch between the training and testing

data. We also applied the mean and variance normalization

(MVN) to preprocess the data to reduce the training-testing

mismatch. After MVN, MCE and SME perform even better

as the generalization capability now is more closely related to

robustness. The best performance on Aurora-2 is obtained from

SME and about 28% relative error rate reduction is achieved

over the MVN baseline system. Finally, we also use SME to

demonstrate the potential of better generalization capability in

improving robustness in more realistic noisy task using the

Aurora-3 task, and significant improvements are obtained.


Publication typeInproceedings
Published inProc. ASRU
> Publications > A study on hidden Markov model's generalization capability for speech recognition