Jinyu Li and Chin-Hui Lee
We study issues related to designing speech event detectors for automatic speech recognition. Event detection is a critical component of a recently proposed automatic speech attribute transcription (ASAT) paradigm for speech research. Similar to keyword spotting and non-keyword rejection, a good detector needs to effectively detect speech attributes of interest while rejecting extraneous events. We compare frame and segment based detectors, study their properties in detecting manners of articulation, and propose new performance measures. We test these detectors on the TIMIT database with several evaluation criteria. Our results indicate that segment based detectors outperform frame based detectors in several key aspects of speech detector design. We also show that the performance can be significantly enhanced by incorporating discriminative training into designing speech event detectors.
|Published in||Proc. Interspeech|