Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
A Nonlinear Observation Model for Removing Noise from Corrupted Speech Log Mel-Spectral Energies

Jasha Droppo, Alex Acero, and Li Deng

Abstract

In this paper we present a new statistical model, which describes the corruption to speech recognition Mel-frequency spectral features caused by additive noise. This model explicitly represents the effect of unknown phase together with the unobserved clean speech and noise as three hidden variables. We use this model to produce noise robust features for automatic speech recognition. The model is constructed in the log Mel-frequency feature domain. In addition to being linearly related to MFCC recognition parameters, we gain the advantage of low dimensionality and independence of the corruption across feature dimensions. We illustrate the surprising result that, even when the true noise Mel-frequency spectral feature is known, the traditional spectral subtraction formula is flawed. We show the new model can be used to derive a spectral subtraction formula which produces superior error rate results, and is less sensitive to tuning parameters. Finally, we present results demonstrating that the new model is more general than spectral subtraction, and can take advantage of a prior noise estimate to produce robust features, rather than relying on point estimates of noise.

Details

Publication typeInproceedings
Published inProc. International Conference on Spoken Language Processing
AddressDenver, Colorado
> Publications > A Nonlinear Observation Model for Removing Noise from Corrupted Speech Log Mel-Spectral Energies