Feature normalization using structured full transforms for robust speech recognition

Xiong Xiao, Jinyu Li, and et. al

Abstract

Classical mean and variance normalization (MVN) uses a di-

agonal transform and a bias vector to normalize the mean and

variance of noisy features to reference values. As MVN uses di-

agonal transform, it ignores correlation between feature dimen-

sions. Although full transform is able to make use of feature

correlation, its large amount of parameters may not be estimated

reliably from a short observation, e.g. 1 utterance. We propose a

novel structured full transform that has the same amount of free

parameters as diagonal transform while being able to capture

correlation between feature dimensions. The proposed struc-

tured transform can be estimated reliably from one utterance by

maximizing the likelihood of the normalized features on a refer-

ence Gaussian mixture model. Experimental results on Aurora-

4 task show that the structured transform produces consistently

better speech recognition results than diagonal transform and

also outperforms advanced frontend (AFE) feature extractor.

Details

Publication typeInproceedings
Published inProc. Interspeech
> Publications > Feature normalization using structured full transforms for robust speech recognition