Jinyu Li, Yu Tsao, and Chin-Hui Lee
We propose a rescoring framework for speech recognition that incorporates acoustic phonetic knowledge sources. The scores corresponding to all knowledge sources are generated from a collection of neural network based classifiers. Rescoring is then performed by combining different knowledge scores and uses them to reorder candidate strings provided by state-of-the-art HMM-based speech recognizers. We report on continuous phone recognition experiments using the TIMIT database. Our results indicate that classifying manners and places of articulation provides additional information in rescoring, and achieving improved accuracies over our best baseline speech recognizers using both context-independent and context-dependent phone models. The same technique can also be extended to lattice rescoring and large vocabulary continuous speech recognition.
|Published in||Proc. ICASSP|