Speech Recognition with Flat Direct Models

Patrick Nguyen; Georg Heigold; Geoffrey Zweig

Speech Recognition with Flat Direct Models

Patrick Nguyen ,
Georg Heigold ,
Geoffrey Zweig

IEEE Journal of Selected Topics in Signal Processing | January 2010

Download BibTex

This article describes a novel direct modeling approach for speech recognition. We propose a log-linear modeling framework based on using numerous features which each measure some form of consistency between the underlying speech and an entire sequence of hypothesized words. Since the model relates the entire audio signal to a complete hypothesis without necessarily positing any inherent structure, we term this a Flat Direct Model (FDM). In contrast to a conventional HMM approach, no Markov assumptions are used, and the model is not necessarily sequential. We demonstrate the use of features based on both template-matching distances, and the acoustic detection of multi-phone units which are selected so as to have maximal mutual information with respect to word labels. Further, we solve the key problem of how to define features which can generalize to unseen word sequences. In the proposed model, templatebased features improve sentence error rate by 3% absolute over the baseline, while multi-phone based features improve by 2% absolute.

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/