Ivan Tashev, Andrew Lovitt, and Alex Acero
20 April 2010
Voice activity detectors (VAD) are integral part of the modern speech processing, speech enhancement and speech encoding systems. One of the major problems in practical realizations is to achieve robust VAD in conditions of background noise. Most of the statistical model-based approaches employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which deviates from the real observation. In this paper, we propose a class of VAD algorithms based on several statistical models of the probability density functions of the magnitudes. In addition, we evaluate several approaches for combining the likelihoods for each frequency bin for estimation of the likelihood for the entire frame. A data corpus with in-car noise is then used to evaluate the VAD and the results are discussed.
In NOISE-CON 2010 and 159th Meeting of the Acoustical Society of America
Publisher Acoustical Society of America
All copyrights reserved by the Accoustical Society of America 2007.