Detection-Based ASR in the Automatic Speech Attribute Transcription Project

We present methods of detector design in the Automatic Speech

Attribute Transcription project. This paper details the results of

a student-led, cross-site collaboration between Georgia Institute

of Technology, The Ohio State University and Rutgers University.

The work reported in this paper describes and evaluates the

detection-based ASR paradigm and discusses phonetic attribute

classes, methods of detecting framewise phonetic attributes and

methods of combining attribute detectors for ASR.

We use Multi-Layer Perceptrons, Hidden Markov Models

and Support Vector Machines to compute confidence scores for

several prescribed sets of phonetic attribute classes. We use Conditional

Random Fields (CRFs) and knowledge-based rescoring

of phone lattices to combine framewise detection scores for continuous

phone recognition on the TIMIT database. With CRFs,

we achieve a phone accuracy of 70.63%, outperforming the baseline

and enhanced HMM systems, by incorporating all of the attribute

detectors discussed in the paper.

interspeech07_2.pdf
PDF file

In  Proc. Interspeech

Details

TypeInproceedings
> Publications > Detection-Based ASR in the Automatic Speech Attribute Transcription Project