Yongang Deng, Milind Mahajan, and Alex Acero
September 2003
We address the problem of estimating the word error rate
(WER) of an automatic speech recognition (ASR) system
without using acoustic test data. This is an important problem
which is faced by the designers of new applications which use
ASR. Quick estimate of WER early in the design cycle can be
used to guide the decisions involving dialog strategy and
grammar design. Our approach involves estimating the
probability distribution of the word hypotheses produced by
the underlying ASR system given the text test corpus. A
critical component of this system is a phonemic confusion
model which seeks to capture the errors made by ASR on the
acoustic data at a phonemic level. We use a confusion model
composed of probabilistic phoneme sequence conversion
rules which are learned from phonemic transcription pairs
obtained by leave-one-out decoding of the training set. We
show reasonably close estimation of WER when applying the
system to test sets from different domains.
![]() PDF file |
In: Proc. of the European Conference on Speech Communication
Publisher: International Speech Communication Association
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.
| Type: | Inproceedings |