Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, and Alex Acero
September 2010
Information retrieval methods are frequently used for indexing
and retrieving spoken documents, and more recently
have been proposed for voice-search amongst a pre-defined set
of business entries. In this paper, we show that these methods
can be used in an even more fundamental way, as the core component
in a continuous speech recognizer. Speech is initially
processed and represented as a sequence of discrete symbols,
specifically phoneme or multi-phone units. Recognition then
operates on this sequence. The recognizer is segment-based,
and the acoustic score for labeling a segment with a word is
based on the TF-IDF similarity between the subword units detected
in the segment, and those typically seen in association
with the word. We present promising results on both a voice
search task and the Wall Street Journal task. The development
of this method brings us one step closer to being able to do
speech recognition based on the detection of sub-word audio
attributes.
![]() PDF file |
Publisher International Speech Communication Association
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.
| Type | Inproceedings |