Michael Seltzer, B. Raj, and R. Stern
Missing feature methods of noise compensation for speech recognition operate by first identifying components of a
spectrographic representation of speech that are considered to be corrupt. Recognition is then performed either using
only the remaining reliable components, or the corrupt components are reconstructedprior to recognition. These methods
require a spectrographic mask which accurately labels the reliable and corrupt regions of the spectrogram. Depending
on the missing feature methodapplied , these masks must either contain binary values or probabilistic values.
Current mask estimation techniques rely on explicit estimation of the characteristics of the corrupting noise. The estimation
process usually assumes that the noise is pseudo-stationary or varies slowly with time. This is a significant drawback
since the missing feature methods themselves have no such restrictions. We present a new mask estimation
technique that uses a Bayesian classifier to determine the reliability of spectrographic elements. Features used for classification
were designed that make no assumptions about the corrupting noise signal, but rather exploit characteristics
of the speech signal itself. Experiments were performedon speech corruptedby a variety of noises, using missing feature
compensation methods which require binary masks and probabilistic masks. In all cases, the proposed Bayesian mask
estimation methodresultedin significantly better recognition accuracy than conventional mask estimation approaches. © 2004 Elsevier B.V. All rights reserved.
|Published in||Speech communication|