Speech Recognition with Flat Direct Models

  • Patrick Nguyen ,
  • Georg Heigold ,
  • Geoffrey Zweig

IEEE Journal of Selected Topics in Signal Processing |

This article describes a novel direct modeling approach for speech recognition. We propose a log-linear modeling framework based on using numerous features which each measure some form of consistency between the underlying speech and an entire sequence of hypothesized words. Since the model relates the entire audio signal to a complete hypothesis without necessarily positing any inherent structure, we term this a Flat Direct Model (FDM). In contrast to a conventional HMM approach, no Markov assumptions are used, and the model is not necessarily sequential. We demonstrate the use of features based on both template-matching distances, and the acoustic detection of multi-phone units which are selected so as to have maximal mutual information with respect to word labels. Further, we solve the key problem of how to define features which can generalize to unseen word sequences. In the proposed model, templatebased features improve sentence error rate by 3% absolute over the baseline, while multi-phone based features improve by 2% absolute.