Detection-Based ASR in the Automatic Speech Attribute Transcription Project

Ilana Bromberg, Qiang Fu, Jun Hou, Jinyu Li, and et. al

Abstract

We present methods of detector design in the Automatic Speech

Attribute Transcription project. This paper details the results of

a student-led, cross-site collaboration between Georgia Institute

of Technology, The Ohio State University and Rutgers University.

The work reported in this paper describes and evaluates the

detection-based ASR paradigm and discusses phonetic attribute

classes, methods of detecting framewise phonetic attributes and

methods of combining attribute detectors for ASR.

We use Multi-Layer Perceptrons, Hidden Markov Models

and Support Vector Machines to compute confidence scores for

several prescribed sets of phonetic attribute classes. We use Conditional

Random Fields (CRFs) and knowledge-based rescoring

of phone lattices to combine framewise detection scores for continuous

phone recognition on the TIMIT database. With CRFs,

we achieve a phone accuracy of 70.63%, outperforming the baseline

and enhanced HMM systems, by incorporating all of the attribute

detectors discussed in the paper.

Details

Publication typeInproceedings
Published inProc. Interspeech
> Publications > Detection-Based ASR in the Automatic Speech Attribute Transcription Project