M. R. P. Thomas, J. Gudnason, and P. A. Naylor
This paper presents a method for speech time scale modification. Voiced speech is pseudo periodic, allowing time scale modification by the repetition or removal of cycles as necessary. However, in the case of unvoiced speech and at the boundaries of voiced speech, no such periodicity exists so the speech should not be modified. To address this issue, the proposed approach is novel in its use of the DYPSA algorithm to derive speech periodicity from glottal closure instants (GCIs), followed by a Gaussian Mixture model-based voiced/unvoiced/silence (VUS) classifier. A listening test based on ITU-T P800 has been conducted and has shown that, by employing VUS detection, the average mean opinion score of the perceptual quality of processed speech exceeds that of a method without VUS detection by 0.61 over a range of modification factors. Results are presented as a function of modification factor for normal and fast original talking rate. Reliable time scale modification of high audio quality enables many applications, such as time scale compression for fast scanning of recorded voicemail messages, slowing talking rate for improved intelligibility in forensics and lip synchronization in motion video.
|Published in||Proc. European Signal Processing Conf. (EUSIPCO)|