A study on the generalization capability of acoustic models for robust speech recognition

Xiong Xiao, Jinyu Li, and et. al


In statistical learning theory, good generalization

capability refers to small performance degradation when the

model is evaluated on unseen testing data that are drawn from the

same distribution as the training data, i.e. on matched trainingtesting

case. Recently, soft-margin estimation (SME) method was

proposed to improve acoustic model’s generalization capability

for clean speech recognition and achieved success. In this paper,

we study the generalization capability of acoustic model for

robust speech recognition, where the training and testing data follow

different distributions (i.e. mismatched training-testing case).

From our analysis of noise effect on the log likelihood values of

noisy speech features, although mismatch exists between testing

and training data, it is still possible to achieve better robustness

by improving the acoustic model’s generalization capability

using SME. This is confirmed by our experimental study on

Aurora-2 and Aurora-3 tasks, where SME improves recognition

performance significantly for both matched and low/medium

mismatched testing cases. However, the improvement in severely

mismatched cases is relatively small. To alleviate the violation of

SME assumption about the same distribution for training and

testing data, we apply mean and variance normalization (MVN)

to process speech features prior to model training. Experimental

study shows that when training-testing mismatch is reduced,

SME delivers better performance improvement. We expect SME

to improve the robustness of speech recognition further when it

is combined with other robustness methods. Although this study

is on noisy speech recognition tasks, the method and discovery in

this paper have no assumption on the type of distortion, and can

be extended to deal with different types of distortions in other

machine learning applications.


Publication typeArticle
Published inIEEE Transactions on Audio, Speech and Language Processing
> Publications > A study on the generalization capability of acoustic models for robust speech recognition