Arun C. Surendran, CCSP Group, Microsoft Research
John C. Platt, CCSP Group, Microsoft Research
IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 5, pp. 625-628, (2004).
We
introduce an elegant and novel design for a speech detector which estimates the
probability of the presence of speech in each time-frequency bin, as well as in
each frame. The proposed system uses discriminative estimators based on
logistic regression, and incorporates spectral and temporal correlations in the
same framework. The detector is flexible enough to be configured in a single
level or a “stacked” bi-level architecture depending on the needs of the application.
An important part of the proposed design is the use of a new set of features:
the normalized logarithm of the estimated posterior signal-to-noise ratio.
These can be easily and automatically generated by tracking the noise spectrum
online. We present results on the
© 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PDF file (307 KB)