Mulan — Microsoft Bilingual TTS

 

With applications such as spoken dialog systems, call center services, voiced-enabled web and email services being introduced, an increasing emphasis is placed on generating natural sounding speech. However, single language TTS (text-to-speech) is often not enough. Many applications need to deal with multiple languages. In our usability study of Mandarin TTS, the lack of ability to handle English words and phrases embedded in Chinese text deters the adoption of TTS technology, since much Chinese content, especially IT related articles or emails, contain English words, phrases or names. Some applications solve this problem by switching between two TTS engines. The main drawback of this approach is that the voices coming out of the two engines sound different. Users are always annoyed when hearing such two-voice utterances. Furthermore, switching between two engines will destroy the overall sentence intonation. Mulan is the first real bilingual TTS system that can switch between Mandarin and English freely and smoothly without changing the engine and it always keeps the sentence level intonation for mixed-lingual texts.

Constructed based on the No Prediction No Scaling prosodic strategy and Prosodic Constraint Oriented unit selection strategy, Mulan avoids the voice distortions in many other systems, such as the monotonous sound caused be the finite ability of prosody prediction models, or, the mechanical or buzzing sound caused by the pitch and time scaling algorithms.  Therefore, Mulan can generate very natural speech that sounds like the original voice talent and it can inherit the prosody behavior from the voice talent.

Please click here to hear samples from Mulan