A Mixed-Excitation Frequency Domain Model for Time-Scale Pitch-Scale Modification of Speech

Alex Acero

Abstract

This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation, the new method shows improvement in voiced fricatives and over-stretched voiced sounds. In addition, it allows for spectral manipulation such as smoothing of discontinuities at unit boundaries, voice transformations or loudness equalization.

Details

Publication typeInproceedings
Published inProc. of the Int. Conf. on Spoken Language Processing
> Publications > A Mixed-Excitation Frequency Domain Model for Time-Scale Pitch-Scale Modification of Speech