A study on knowledge source integration for candidate rescoring in automatic speech recognition

Proc. ICASSP |

We propose a rescoring framework for speech recognition that
incorporates acoustic phonetic knowledge sources. The scores
corresponding to all knowledge sources are generated from a
collection of neural network based classifiers. Rescoring is then
performed by combining different knowledge scores and uses
them to reorder candidate strings provided by state-of-the-art
HMM-based speech recognizers. We report on continuous phone
recognition experiments using the TIMIT database. Our results
indicate that classifying manners and places of articulation
provides additional information in rescoring, and achieving
improved accuracies over our best baseline speech recognizers
using both context-independent and context-dependent phone
models. The same technique can also be extended to lattice
rescoring and large vocabulary continuous speech recognition.