Recent Improvements on Microsofts Trainable Text-to-Speech System: Whistler

X. D. Huang, Alex Acero, Hsiao-Wuen Hon, Yun-Cheng Ju, J. Liu, S. Meredith, and M. Plumpe

Abstract

Whistler Text-to-Speech engine was designed so that we can

automatically construct the model parameters from training data

[7]. This paper will focus on recent improvements on prosody

and acoustic modeling, which are all derived through the use of

probabilistic learning methods. Whistler can produce synthetic

speech that sounds very natural and resembles the acoustic and

prosodic characteristics of the original speaker. The underlying

technologies used in Whistler can significantly facilitate the

process of creating generic TTS systems for a new language, a

new voice, or a new speech style. Whisper TTS engine supports

Microsoft Speech API [10] and requires less than 3 MB of

working memory.

Details

Publication typeInproceedings
Published inProc. of the Int. Conf. on Acoustics, Speech, and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers, Inc.
> Publications > Recent Improvements on Microsofts Trainable Text-to-Speech System: Whistler