A Comparison of Three Non-Linear Observation Models for Noisy Speech Features

Li Deng; Alex Acero; Jasha Droppo

A Comparison of Three Non-Linear Observation Models for Noisy Speech Features

Li Deng ,
Alex Acero ,
Jasha Droppo

Proc. Eurospeech Conference | September 2003

Published by International Speech Communication Association

Download BibTex

This paper reports our recent efforts to develop a uni£ed, non-linear, stochastic model for estimating and removing the effects of additive noise on speech cepstra. The complete system consists of prior models for speech and noise, an observation model, and an inference algorithm. The observation model quantifies the relationship between clean speech, noise, and the noisy observation. Since it is expressed in terms of the log mel-frequency filter-bank features, it is non-linear. The inference algorithm is the procedure by which the clean speech and noise are estimated from the noisy observation. The most critical component of the system is the observation model. This paper derives a new approximation strategy and compares it with two existing approximations. It is shown that the new approximation uses half the calculation, and produces equivalent or improved word accuracy scores, when compared to previous techniques. We present noise-robust recognition results on the standard Aurora 2 task.

© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.