*
Quick Links|Home|Worldwide
Microsoft*
Search for


 

Speech Technology Home


Dereverberation Project in Microsoft Research

    What is reverberation and why it hurts?

When a sound source is placed in closed room or near sound reflecting surfaces the listener receives not only the direct wave, but in addition multiple reflected waves. This smears the speech features and makes it less intelligible for humans and reduces the recognition rate for speech recognition engines. Therefore for best speech recognition results users are forced to use headsets with close-talk microphones.

    Dereverberation as deconvolution

Nearly every approach assumes a convolutional model for the effects of reverberation. Then it is logical to try undoing the effects of reverberation by deconvolution (inverse filtering). This can be done mathematically perfect only if the room response is minimum phase, i.e. is causal, invertible and the inverted function is causal. As in most of the cases this is not true usually the de-convolution function is an approximation. Estimating the room response is more difficult in presence of noise.

    Blind dereverberation

Blind dereverberation methods seek to estimate the input without explicitly computing a de-convolution or inverse filter. Some methods use probabilistic speech models and even Independent Component Analysis (ICA).

    Dereverberation via suppression and enhancement

This approach tries to remove the reverberation effects by methods used for noise suppression and speech enhancement. These algorithms will either try to suppress the reverberation, to enhance the speech or both. In contrast of blind algorithms, however, there is no source signal estimation either. Rather the waveform is processes to reduce the negative effects of reverberation and enhance qualities of the captured waveform.

    Our approach and initial results for ASR

The initial goal of our dereverberation project is to improve the speech recognition results from our microphone array for distances up to 3 feet and to make them as close as possible to close talk microphone. Most of the modern ASR systems have Cepstral Mean Normalization in the front end. The purpose of this processing id to compensate the frequency response of the capturing channel, but, due to relatively fast adaptation time, it successfully compensates the early reverberation - up to 50 ms. At that time the rate of reflections arriving already exceeds the sampling rate, converting the reverberation to a stochastic process. Estimation of the room response would not give us good results under these conditions, therefore we choose to do de-reverberation via suppression.

Initial results are shown in the next charts. A test set of ~3000 utterances was recorded using close talk microphone, regular analog PC microphone and the four element microphone array in a conference room from distances 1.0 and 2.5 meters. The sound was played trough B&K mouth simulator.

 

   

    Contact

For more information about the microphone array project contact Ivan Tashev (ivantash--at--microsoft--dot--com).

 


©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement