Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Feature normalization using structured full transforms for robust speech recognition

Xiong Xiao, Jinyu Li, and et. al

Abstract

Classical mean and variance normalization (MVN) uses a di- agonal transform and a bias vector to normalize the mean and variance of noisy features to reference values. As MVN uses di- agonal transform, it ignores correlation between feature dimen- sions. Although full transform is able to make use of feature correlation, its large amount of parameters may not be estimated reliably from a short observation, e.g. 1 utterance. We propose a novel structured full transform that has the same amount of free parameters as diagonal transform while being able to capture correlation between feature dimensions. The proposed struc- tured transform can be estimated reliably from one utterance by maximizing the likelihood of the normalized features on a refer- ence Gaussian mixture model. Experimental results on Aurora- 4 task show that the structured transform produces consistently better speech recognition results than diagonal transform and also outperforms advanced frontend (AFE) feature extractor.

Details

Publication typeInproceedings
Published inProc. Interspeech
> Publications > Feature normalization using structured full transforms for robust speech recognition