HMM-based Speech Synthesis: Fundamentals and Its Recent Advances

The task of speech synthesis is to convert normal language text into speech. In recent years, hidden Markov model (HMM) has been successfully applied to acoustic modeling for speech synthesis, and HMM-based parametric speech synthesis has become a mainstream speech synthesis method. This method is able to synthesize highly intelligible and smooth speech sounds. Another significant advantage of this model-based parametric approach is that it makes speech synthesis far more flexible compared to the conventional unit selection and waveform concatenation approach.

This talk will first introduce the overall HMM synthesis system architecture developed at USTC. Then, some key techniques will be described, including the vocoder, acoustic modeling, parameter generation algorithm, MSD-HMM for F0 modeling, context-dependent model training, etc. Our method will be compared with the unit selection approach and its flexibility in controlling voice characteristics will also be presented.

The second part of this talk will describe some recent advances of HMM-based speech synthesis at the USTC speech group. The methods to be described include: 1) articulatory control of HMM-based speech synthesis, which further improves the flexibility of HMM-based speech synthesis by integrating phonetic knowledge, 2) LPS-GV and minimum KLD based parameter generation, which alleviates the over-smoothing of generated spectral features and improves the naturalness of synthetic speech, and 3) hybrid HMM-based/unit-selection approach which achieves excellent performance in the Blizzard Challenge speech synthesis evaluation events of recent years.

Speaker Details

Zhen-Hua Ling received the B.E. degree in electronic information engineering, M.S. and Ph.D. degree in signal and information processing from University of Science and Technology of China (USTC), Hefei, China, in 2002, 2005, and 2008 respectively. From October 2007 to March 2008, he was a Marie Curie Fellow at the Centre for Speech Technology Research (CSTR), University of Edinburgh, U.K.. From July 2008 to February 2011, he was a joint postdoctoral researcher at USTC and iFLYTEK Co., Ltd., China. He is currently an associate professor at USTC. His research interests include speech synthesis, voice conversion, speech analysis, and speech coding. He was awarded IEEE Signal Processing Society Young Author Best Paper Award in 2010.

Date:
Speakers:
Zhen-Hua Ling
Affiliation:
University of Science and Technology of China; University of Washington

Series: Microsoft Research Talks