Customizing Base Unit Set with Speech Database in TTS Systems

  • Yining Chen ,
  • Yong Zhao ,
  • Min Chu

ACL/SIGPARSE |

In unit selection based speech synthesizer, defining a good
unit set is crucial to the speech quality. In this paper, a method
of customizing the TTS base unit set with a specific speech
corpus is proposed. Multi-phoneme units are boosted from the
initial phoneme-sized unit. A new multi-phoneme unit is
added to the inventory based upon its own frequency count
and the affected frequency count of other units. As a result, a
large base unit set, which contains many multi-phoneme units,
is formed when the speech corpus is large. While, for a small
speech corpus, only a few bi-phoneme or tri-phoneme are
found. Such a scalable base unit set makes it possible to
achieve better smoothness in concatenation while maintain the
naturalness of prosody. Evaluation results show that, after
replacing the phone-sized base unit set with the customized set,
the search speed is improved by 5 times and 59% preference
score is obtained.