A New Fast TTS
For multi-channel TTS applications, e.g. in a cloud service, it is highly desirable that high quality speech can be synthesized in low complexity. We propose a fast table lookup based, statistical model driven approach to non-uniform unit selection TTS for that purpose. In TTS training, the voice font of all waveform segments is organized as a Gaussian kernel coded hash table and a table for storing quantized costs of all possible concatenation segment pairs. In synthesis, waveform segments with non-uniform lengths are first selected to construct a candidate lattice by looking up the Gaussian kernel coded hash table, and the best path is searched in the lattice by minimizing the accumulated concatenation scores, which are retrieved from the quantization table for possible concatenations. Experimental results show that the new approach can significantly reduce the search complexity while keep a high TTS voice quality.
Demos: Click to play