A Mixed-Excitation Frequency Domain Model for Time-Scale Pitch-Scale Modification of Speech

Alex Acero

Abstract

This paper presents a time-scale pitch-scale modification

technique for concatenative speech synthesis. The method is

based on a frequency domain source-filter model, where the

source is modeled as a mixed excitation. This model is highly

coupled with a compression scheme that result in compact

acoustic inventories. When compared to the approach in the

Whistler system using no mixed excitation, the new method

shows improvement in voiced fricatives and over-stretched

voiced sounds. In addition, it allows for spectral manipulation

such as smoothing of discontinuities at unit boundaries, voice

transformations or loudness equalization.

Details

Publication typeInproceedings
Published inProc. of the Int. Conf. on Spoken Language Processing
> Publications > A Mixed-Excitation Frequency Domain Model for Time-Scale Pitch-Scale Modification of Speech