Yu Shi and Eric Chang
Over the past several years, the primary focus for the speechrecognition research community has been speaker-independent speech recognition, with the emphasis of working on databases with larger and larger number of speakers. For example, the most recent EARS program which is sponsored by DARPA calls for recordings of thousands of speakers. In this paper, however, we are interested in making speech interface work well for one particular individual. For this purpose, we propose using massive amounts of speaker-specific training data recorded in one’s daily life. We call this Massively Speaker-Specific Recognition (MSSR). As a pre-research, we leverage the large corpus we have available from speech-synthesis work to study the benefit of MSSR only from acoustic-modeling aspect. Initial results show that by changing the focus to MSSR, word error rates can drop very signifi- cantly. In comparison with speaker-adaptive speech recognition system, MSSR also performs better since model parameters can be tuned to be su table to one particular individual.
|Publisher||Institute of Electrical and Electronics Engineers, Inc.|
© 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.