Continuous Speech Recognition with a TF-IDF Acoustic Model

Information retrieval methods are frequently used for indexing

and retrieving spoken documents, and more recently

have been proposed for voice-search amongst a pre-defined set

of business entries. In this paper, we show that these methods

can be used in an even more fundamental way, as the core component

in a continuous speech recognizer. Speech is initially

processed and represented as a sequence of discrete symbols,

specifically phoneme or multi-phone units. Recognition then

operates on this sequence. The recognizer is segment-based,

and the acoustic score for labeling a segment with a word is

based on the TF-IDF similarity between the subword units detected

in the segment, and those typically seen in association

with the word. We present promising results on both a voice

search task and the Wall Street Journal task. The development

of this method brings us one step closer to being able to do

speech recognition based on the detection of sub-word audio

attributes.

tfidf_am.pdf
PDF file

Publisher  International Speech Communication Association
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.

Details

TypeInproceedings
> Publications > Continuous Speech Recognition with a TF-IDF Acoustic Model