A Detection Based Approach to Robust Speech Understanding

Kuansan Wang

A Detection Based Approach to Robust Speech Understanding

Kuansan Wang

Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing | May 2004

Published by IEEE SPS

Download BibTex

Field speech data pose great challenges to statistical modeling because the speech signal is often intermixed with extraneous sounds and other environmental noises that are either too difficult to compensate dynamically or too expensive to collect sufficient data for proper offline training. In this paper, we propose a detection based method in which the speech recognizer can sharply tune to only the “meaningful” speech and gracefully ignore the “unwanted” audio segments. The method is designed to be integrated with the frame synchronous search for a single pass processing. In contrast to the conventional keyword spotting techniques, this integration allows the use of the language model for better predicting the detection targets during the search. To study its efficacy, we apply the framework to a spontaneous speech understanding application where cohesive phrases congruent to the domain semantics and application context are used as the salient feature for selective hearing. Experimental results on the effectiveness of the system in dealing with out of domain phrases and other spontaneous speech effects are encouraging.