Li Deng, Jasha Droppo, and Alex Acero
In this paper we present an MMSE (minimum mean square error) speech feature enhancement algorithm, capitalizing on a new probabilistic, nonlinear environment model that effectively incorporates the phase relationship between the clean speech and the corrupting noise in acoustic distortion. The MMSE estimator based on this phase-sensitive model is derived and it achieves high efficiency by exploiting single-point Taylor series expansion to approximate the joint probability of clean and noisy speech as a multivariate Gaussian. As an integral component of the enhancement algorithm, we also present a new sequential MAP-based nonstationary noise estimator. Experimental results on the Aurora2 task demonstrate the importance of exploiting the phase relationship in the speech corruption process captured by the MMSE estimator. The phasesensitive MMSE estimator reported in this paper performs significantly better than phase-insensitive spectral subtraction (54% error rate reduction), and also noticeably better than a phase-insensitive MMSE estimator as our previous state-of-the-art technique reported in  (7% error rate reduction), under otherwise identical experimental conditions of speech recognition.
In Proc. International Conference on Spoken Language Processing