Kuansan Wang
May 2004
Field speech data pose great challenges to statistical
modeling because the speech signal is often intermixed
with extraneous sounds and other environmental noises
that are either too difficult to compensate dynamically or
too expensive to collect sufficient data for proper offline
training. In this paper, we propose a detection based
method in which the speech recognizer can sharply tune to
only the “meaningful” speech and gracefully ignore the
“unwanted” audio segments. The method is designed to be
integrated with the frame synchronous search for a single
pass processing. In contrast to the conventional keyword
spotting techniques, this integration allows the use of the
language model for better predicting the detection targets
during the search. To study its efficacy, we apply the
framework to a spontaneous speech understanding
application where cohesive phrases congruent to the
domain semantics and application context are used as the
salient feature for selective hearing. Experimental results
on the effectiveness of the system in dealing with out of
domain phrases and other spontaneous speech effects are
encouraging.
![]() PDF file |
In: Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing
| Type: | Inproceedings |