L. Lee, P. Fleguth, and Li Deng
This paper introduced a new speech production model aiming
at synthesizing natural speech in real-time by modeling
the key dynamic properties of the articulators in a nonlinear
state-space framework. The goal-oriented movement of the
tongue tip, tongue dorsum, upper lip, lower lip and jaw are
described in a linear state equation. The so produced articulatory
trajectories combined with the effects of velum and
larynx are mapped into acoustic features in the nonlinear
observation equation. The input and output of the model are
time-aligned phone sequence and speech waveform respectively.
This speech production model can also be directly
applied to speech recognition to better account for coarticulation
and phonetic reduction phenomenon with considerably
less parameters than the traditional HMM based approaches.
|Published in||Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing|