Automatic Generation of Synthesis Units for Trainable Text-to-Speech Systems

Whistler Text-to-Speech engine was designed so that we can

automatically construct the model parameters from training data.

This paper will describe in detail the design issues of constructing

the synthesis unit inventory automatically from speech databases.

The automatic process includes (1) determining the scaleable

synthesis unit which can reflect spectral variations of different

allophones; (2) segmenting the recording sentences into phonetic

segments; (3) select good instances for each synthesis unit to

generate best synthesis sentence during run time. These processes

are all derived through the use of probabilistic learning methods

which are aimed at the same optimization criteria. Through this

automatic unit generation, Whistler can automatically produce

synthetic speech that sounds very natural and resembles the

acoustic characteristics of the original speaker.

1998-hon-icassp.pdf
PDF file

In  Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing

Details

TypeInproceedings
> Publications > Automatic Generation of Synthesis Units for Trainable Text-to-Speech Systems