Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Detection-Based ASR in the Automatic Speech Attribute Transcription Project

Ilana Bromberg, Qiang Fu, Jun Hou, Jinyu Li, and et. al

Abstract

We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR. We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paper.

Details

Publication typeInproceedings
Published inProc. Interspeech
> Publications > Detection-Based ASR in the Automatic Speech Attribute Transcription Project