C. White, G. Zweig, L. Burget, P. Schwarz, and H. Hermansky
2008
Automatic Speech Recognition (ASR) systems continue to make
errors during search when handling various phenomena including
noise, pronunciation variation, and out of vocabulary (OOV) words.
Predicting the probability that a word is incorrect can prevent the
error from propagating and perhaps allow the system to recover.
This paper addresses the problem of detecting errors and OOVs for
read Wall Street Journal speech when the word error rate (WER) is
very low. It augments a traditional confidence estimate by introducing
two novel methods: phone-level comparison using Multi-String
Alignment (MSA) and word-level comparison using phone-to-word
transduction. We show that features from phone and word string
comparisons can be added to a standard maximum entropy framework
thereby substantially improving performance in detecting both
errors and OOVs. Additionally we show an extension to detecting
English and accented English for the Language Identification (LID)
task.
![]() PDF file |
In In Proceedings of ICASSP
| Type | Inproceedings |