A Discriminative Training Framework using N-Best Speech Recognition Transcriptions and Scores for Spoken Utterance Classification

In this paper, we propose a novel discriminative training approach to spoken utterance classification (SUC). The ultimate objective of the SUC task, originally developed to map a spoken speech utterance into the most appropriate semantic class, is to minimize the classification error rate (CER). Conventionally, a two-phase approach is adapted, in which the first phase is the ASR transcription phase, and the second phase is the semantic classification phase. In the proposed framework, the classification error rate is approximated as differentiable functions of the language and classifier model parameters. Furthermore, in order to exploit all the available information from the first phase, class-specific discriminant functions are defined based on score functions derived from the N-best lists. Our experimental results on the standard ATIS database indicate a notable reduction in CER from the earlier best result on the identical task. The proposed framework achieved a reduction of CER from 4.92% to 4.04%

2007-Yaman-ICASSP.pdf
PDF file

In  Proc. of the International Conference on Acoustics, Speech and Signal Processing

Publisher  Institute of Electrical and Electronics Engineers, Inc.
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Details

TypeInproceedings
PagesIV-5-IV-8
VolumeIV
AddressHonolulu, Hawaii, U.S.A.
> Publications > A Discriminative Training Framework using N-Best Speech Recognition Transcriptions and Scores for Spoken Utterance Classification