|
Dereverberation Project in Microsoft ResearchWhat is reverberation and why it hurts?When a sound source is placed in closed room or near sound reflecting surfaces the listener receives not only the direct wave, but in addition multiple reflected waves. This smears the speech features and makes it less intelligible for humans and reduces the recognition rate for speech recognition engines. Therefore for best speech recognition results users are forced to use headsets with close-talk microphones. Dereverberation as deconvolutionNearly every approach assumes a convolutional model for the effects of reverberation. Then it is logical to try undoing the effects of reverberation by deconvolution (inverse filtering). This can be done mathematically perfect only if the room response is minimum phase, i.e. is causal, invertible and the inverted function is causal. As in most of the cases this is not true usually the de-convolution function is an approximation. Estimating the room response is more difficult in presence of noise. Blind dereverberationBlind dereverberation methods seek to estimate the input without explicitly computing a de-convolution or inverse filter. Some methods use probabilistic speech models and even Independent Component Analysis (ICA). Dereverberation via suppression and enhancementThis approach tries to remove the reverberation effects by methods used for noise suppression and speech enhancement. These algorithms will either try to suppress the reverberation, to enhance the speech or both. In contrast of blind algorithms, however, there is no source signal estimation either. Rather the waveform is processes to reduce the negative effects of reverberation and enhance qualities of the captured waveform. Our approach and initial results for ASRThe initial goal of our dereverberation project is to improve the speech recognition results from our microphone array for distances up to 3 feet and to make them as close as possible to close talk microphone. Most of the modern ASR systems have Cepstral Mean Normalization in the front end. The purpose of this processing id to compensate the frequency response of the capturing channel, but, due to relatively fast adaptation time, it successfully compensates the early reverberation - up to 50 ms. At that time the rate of reflections arriving already exceeds the sampling rate, converting the reverberation to a stochastic process. Estimation of the room response would not give us good results under these conditions, therefore we choose to do de-reverberation via suppression. Initial results are shown in the next charts. A test set of ~3000 utterances was recorded using close talk microphone, regular analog PC microphone and the four element microphone array in a conference room from distances 1.0 and 2.5 meters. The sound was played trough B&K mouth simulator.
![]() ContactFor more information about the microphone array project contact Ivan Tashev (ivantash--at--microsoft--dot--com).
|