An Experimental Study on Confidence Measures for Robust Speech Recognition
One of the most critical components in a practical speech recognition system is a reliable confidence measure. In this paper, we report a number of experiments we conducted to improve confidence measures for large-vocabulary speaker-independent speech recognition. We first studied the behavior of confidence measures for mispronounced words during the user enrollment phase. Acoustic features at word, phoneme and senone level were examined. We developed a transformation function based system using sub-word features for high performance confidence estimation. Discriminative training was used to optimize the parameters of the transformation function. In comparison to the baseline system, our experiments show that the proposed system reduced the equal error rate by 15% and the false acceptance error by 40% at a number of fixed false rejection rates. Secondly, we augmented our feature vectors for speech recognition error detection. With multi-dimensional features and a linear classifier, our experiments show that the false acceptance error can be reduced by 80% in comparison with our single feature baseline system. Finally, we investigated how we could use confidence measures to reject noise, the most challenging form for recognition error detection. With our explicit noise modeling and a secondary classifier, we have reduced the noise rejection error down to 7% – a 68% error reduction over our baseline system.