Ilana Bromberg, Qiang Fu, Jun Hou, Jinyu Li, and et. al.
2007
We present methods of detector design in the Automatic Speech
Attribute Transcription project. This paper details the results of
a student-led, cross-site collaboration between Georgia Institute
of Technology, The Ohio State University and Rutgers University.
The work reported in this paper describes and evaluates the
detection-based ASR paradigm and discusses phonetic attribute
classes, methods of detecting framewise phonetic attributes and
methods of combining attribute detectors for ASR.
We use Multi-Layer Perceptrons, Hidden Markov Models
and Support Vector Machines to compute confidence scores for
several prescribed sets of phonetic attribute classes. We use Conditional
Random Fields (CRFs) and knowledge-based rescoring
of phone lattices to combine framewise detection scores for continuous
phone recognition on the TIMIT database. With CRFs,
we achieve a phone accuracy of 70.63%, outperforming the baseline
and enhanced HMM systems, by incorporating all of the attribute
detectors discussed in the paper.
![]() PDF file |
In Proc. Interspeech
| Type | Inproceedings |