HMM-Based Smoothing for Concatenative Speech Synthesis

This paper will focus on our recent efforts to further improve

the acoustic quality of the Whistler Text-to-Speech engine. We

have developed an advanced smoothing system that a small

pilot study indicates significantly improves quality. We

represent speech as being composed of a number of frames,

where each frame can be synthesized from a parameter vector.

Each frame is represented by a state in an HMM, where the

output distribution of each state is a Gaussian random vector

consisting of x and Dx. The set of vectors that maximizes the

HMM probability is the representation of the smoothed speech

output. This technique follows our traditional goal of

developing methods whose parameters are automatically

learned from data with minimal human intervention. The

general framework is demonstrated to be robust by maintaining

improved quality with a significant reduction in data.

1998-plumpe-icslp.pdf
PDF file

In  Proc. of the Int. Conf. on Spoken Language Processing

Details

TypeInproceedings
Share
Share this page on Facebook
Share this page on Twitter
Share this page on LinkedIn
E-mail this page
RSS feeds
> Publications > HMM-Based Smoothing for Concatenative Speech Synthesis