Dual stage probabilistic voice activity detector

Ivan Tashev; Andrew Lovitt; Alex Acero

Dual stage probabilistic voice activity detector

Ivan Tashev ,
Andrew Lovitt ,
Alex Acero

NOISE-CON 2010 and 159th Meeting of the Acoustical Society of America | April 2010

Published by Acoustical Society of America

Download BibTex

Voice activity detectors (VAD) are integral part of the modern speech processing, speech enhancement and speech encoding systems. One of the major problems in practical realizations is to achieve robust VAD in conditions of background noise. Most of the statistical model-based approaches employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which deviates from the real observation. In this paper, we propose a class of VAD algorithms based on several statistical models of the probability density functions of the magnitudes. In addition, we evaluate several approaches for combining the likelihoods for each frequency bin for estimation of the likelihood for the entire frame. A data corpus with in-car noise is then used to evaluate the VAD and the results are discussed.