Turning a Monolingual Speaker Into Multi-Lingual Speaker

Established: February 21, 2012

Voice user interface needs to output responses in Text-To-Speech (TTS) synthesized speech. Sometimes it is even more desirable to have the response in mixed languages, For example, in a foreign country, it would be convenient if a user of car-navigation system who is not fluent in that particular foreign language could hear instructions in mixed-codes, such as entities like street names synthesized in the local language and routing directions in the user’s native language. The mixed-coded TTS can be easily built by a truly bilingual speaker. However, it is usually difficult to find such a talent. We demo a new approach in turning monolingual TTS into multi-lingual TTS. Out of a speaker’s monolingual recordings, our algorithm can render speech sentences of different languages for building mixed-coded, bilingual TTS systems. We have recordings of 26 languages which are used to build our TTS of corresponding languages. By using the new approach, we can synthesize any mixed language pair out of the 26 languages.

Demo 1: Synthesizing Web Search Result

“Driving directions to Beijing Railway Station. Head south on 中关村(Zhong Guan Cun)南大街(Nan Da Jie), then toward 大慧寺路(Da Hui Si Lu), turn left at 白石新桥(Bai Shi Xin Qiao), continue onto 西直门(Xi Zhi Men)外大街(Wai Da Jie)。” Click to Play
Demo 2: Translating Rick Rashid’s English Speech to Many Languages

Rick is a native English speaker. Here is a sample of the recordings by his public speech.“You know I I never when I first came to Microsoft I would have never imagined that we would be doing that and research here. Click to Play

Translating to Chinese“你知道我从来没有当我第一次来到微软，我从来没有想象，我们将在这里做和研究。” Click to Play

Translating to Spanish“Sabes que nunca, cuando llegué por primera vez a Microsoft nunca habría podido imaginar que íbamos a estar haciendo eso y la investigación aquí.” Click to Play

Translating to Italian“Tu sai che io mai quando sono venuto a Microsoft non avrei mai immaginare che avremmo fatto questo e la ricerca qui.” Click to Play

Rick’s Spanish Speech:“Bienvenido a TechFest 2012, donde hoy se podrá ver de primera mano cómo Microsoft Research está estudiando las tendencias tecnológicas clave que definirán el siglo 21.”Click to Play

Rick’s Chinese Speech:“新年好，欢迎大家来微软亚洲研究院，我为大家读一首诗，我的中文说得不大好，请各位多多批评和指教。李白的下江陵朝辞白帝彩云间，千里江陵一日还。两岸猿声啼不住，轻舟已过万重山。谢谢大家！” Click to Play

Rick’s Italian Speech:“A partire dal prossimo mese, cominceremo un book club in italiano che prenderà in considerazione libri di scrittori italiani contemporanei. Il primo incontro si terrà martedì 6 marzo alle 5 pomeridiane nell’aula 306 della Casa Italiana Zerilli-Marimò, per discutere sia questioni di logistica che la scelta dei libri che saranno considerati. Il libro che sara discusso il 5 marzo sara ‘Che La festa cominci’ di Niccolò Ammaniti.” Click to Play
Demo 3: Craig’s 3D Avatar with Lip-synced TTS Voices

Craig is our Chief Research & Strategy officer Craig Mundie. He is a native English speaker. He doesn’t speak Chinese, but he is speaking Chinese! A Short Introduction

With English TTS Click to Play

With Chinese TTS Click to Play
Demo 4

Kit’s recorded Chinese speech. Click to Play

The same sentence synthesized by TTS built only with his English data. Click to Play

Synthesized semantically unpredictable sentences (SUS)

隔代的后卫要违宪名誉的查询. Click to Play对外的自治要组装负的後母. Click to Play古典的签证要涉及绿的糖尿病. Click to Play长期的作战要报告平均的宝座. Click to Play正面的气象要遭遇毒的号召. Click to Play
Algorithm

Turning a Monolingual Speaker Into Multi-Lingual Speaker

Demo 1

Demo 2

Demo 3

Demo 4

Algorithm