SCARF: A Segmental CRF Speech Recognition System

We propose a theoretical framework for doing speech recognition

with segmental conditional random fields, and describe the implemenation of a toolkit for experimenting with these models. This framework allows users to easily incorporate multiple detector streams into

a discriminatively trained direct model for large vocabulary continuous speech recognition. The detector streams can operate at multiple

scales (frame, phone, multi-phone, syllable or word) and are combined at the word level in the CRF training and decoding processes.

A key aspect of our approach is that features are defined at the

word level, and can thus identify long span phenomena such as the

edit distance between an observed and expected sequence of detection events. Further, a wide variety of features are automatically constructed

from atomic detector streams, allowing the user to focus on the creation of informative detectors. Generalization to unseen words is

possible through the use of decomposable consistency features [1, 2],

and our framework allows for the joint or separate training of the

acoustic and language models.

scarf.pdf
PDF file

Details

TypeTechReport
NumberMSR-TR-2009-54
> Publications > SCARF: A Segmental CRF Speech Recognition System