Predicting speech recognition confidence using deep learning with word identity and score features

Po-Sen Huang; Kshitiz Kumar; Chaojun Liu; Yifan Gong; Li Deng

Predicting speech recognition confidence using deep learning with word identity and score features

Po-Sen Huang ,
Kshitiz Kumar ,
Chaojun Liu ,
Yifan Gong ,
Li Deng

Proc. ICASSP | May 2013

Download BibTex

Confidence classifiers for automatic speech recognition (ASR) provide a quantitative representation for the reliability of ASR decoding. In this paper, we improve the ASR confidence measure performance for an utterance using two distinct approaches: (1) to define and incorporate additional predictors in the confidence classifier including those based on the word identity and on the aggregated words, and (2) to train the confidence classifier built on deep learning architectures including the deep neural network (DNN) and the kernel deep convex network (K-DCN). Our experiments show that adding the new predictors to our multi-layer perceptron (MLP)-based baseline classifier provides 38.6% relative reduction in the correct-reject rate as our measure of the classifier performance. Further, replacing the MLP with the DNN and K-DCN provides an additional 14.5% and 47.5% in the relative performance gain, respectively.