Xuedong Huang, Alex Acero, J. Adcock, J. Goldsmith, and J. Liu
We introduce Whistler, a trainable Text-to-Speech (TTS)
system, that automatically learns the model parameters from a
corpus. Both prosody parameters and concatenative speech units
are derived through the use of probabilistic learning methods
that have been successfully used for speech recognition. Whistler
can produce synthetic speech that sounds very natural and
resembles the acoustic and prosodic characteristics of the
original speaker. The underlying technologies used in Whistler
can significantly facilitate the process of creating generic TTS
systems for a new language, a new voice, or a new speech style.
|Published in||Proc. of the Int. Conf. on Spoken Language Processing|
|Publisher||International Speech Communication Association|
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.