A New Fast TTS
For multi-channel TTS applications, e.g. in a cloud service, it is highly desirable that high quality speech can be synthesized in low complexity. We propose a fast table lookup based, statistical model driven approach to non-uniform unit selection TTS for that purpose. In TTS training, the voice font of all waveform segments is organized as a Gaussian kernel coded hash table and a table for storing quantized costs of all possible concatenation segment pairs. In synthesis, waveform segments with non-uniform lengths are first selected to construct a candidate lattice by looking up the Gaussian kernel coded hash table, and the best path is searched in the lattice by minimizing the accumulated concatenation scores, which are retrieved from the quantization table for possible concatenations. Experimental results show that the new approach can significantly reduce the search complexity while keep a high TTS voice quality.


Demos:                                                            Click to play      

1. 保证有个人创造性和个人爱好的广阔天地.                  BaselineSystme ASystem B

2. 我们党对扶贫开发的高瞻远瞩.                                Baseline, Systme A, System B

3. 布什政府对朝态度强硬是影响韩朝关系发展的重要因素. Baseline, Systme A, System B   

4. 应验了不经历风雨怎么见彩虹的歌词                         Baseline, Systme A, System B

5. 泰国选手帕拉敦诗里查攀获得男子单打冠军.               Baseline, Systme A, System B

6. 最常用的催眠方法是长时间的反复的单调的刺激.          Baseline, Systme A, System B

7. 百分之百的努力不一定有百分之百的成果.                  Baseline, Systme A, System B

8. 鸽子大快朵颐后出现消化不良.                                Baseline, Systme A, System B

9. 关系党和国家的生死存亡.                                      Baseline, Systme A, System B

10.允许本人及其配偶未成年子女在城市登记常住户口.      Baseline, Systme A, System B



Yao Qian (yaoqian@microsoft.com); Frank Soong (frankkps@microsoft.com)