Xiong Xiao, Jinyu Li, and et. al
From statistical learning theory, the generalization
capability of a model is the ability to generalize well on unseen
test data which follow the same distribution as the training
data. This paper investigates how generalization capability can
also improve robustness when testing and training data are
from different distributions in the context of speech recognition.
Two discriminative training (DT) methods are used to train
the hidden Markov model (HMM) for better generalization
capability, namely the minimum classification error (MCE) and
the soft-margin estimation (SME) methods. Results on Aurora-2
task show that both SME and MCE are effective in improving one
of the measures of acoustic model’s generalization capability, i.e.
the margin of the model, with SME be moderately more effective.
In addition, the better generalization capability translates into
better robustness of speech recognition performance, even when
there is significant mismatch between the training and testing
data. We also applied the mean and variance normalization
(MVN) to preprocess the data to reduce the training-testing
mismatch. After MVN, MCE and SME perform even better
as the generalization capability now is more closely related to
robustness. The best performance on Aurora-2 is obtained from
SME and about 28% relative error rate reduction is achieved
over the MVN baseline system. Finally, we also use SME to
demonstrate the potential of better generalization capability in
improving robustness in more realistic noisy task using the
Aurora-3 task, and significant improvements are obtained.
|Published in||Proc. ASRU|