Rich Context Modeling for High Quality HMM-Based TTS

Zhi-Jie Yan, Yao Qian, and Frank K. Soong

Abstract

This paper presents a rich context modeling approach to high quality HMM-based speech synthesis. We first analyze the over-smoothing problem in conventional decision tree tying-based HMM, and then propose to model the training speech tokens with rich context models. Special training procedure is adopted for reliable estimation of the rich context model parameters. In synthesis, a search algorithm following a context-based pre-selection is performed to determine the optimal rich context model sequence which generates natural and crisp output speech. Experimental results show that spectral envelopes synthesized by the rich context models are with crisper formant structures and evolve with richer details than those obtained by the conventional models. The speech quality improvement is also perceived by listeners in a subjective preference test, in which 76% of the sentences synthesized using rich context modeling are preferred.

Details

Publication typeInproceedings
Published in10th Annual Conference of the International Speech Communication Association, InterSpeech 2009
SeriesInterSpeech 2009
PublisherInternational Speech Communication Association
> Publications > Rich Context Modeling for High Quality HMM-Based TTS