M. R. P. Thomas and P. A. Naylor
Accurate estimation of glottal closure instants (GCIs) in voiced speech is important for speech analysis applications which benefit from glottal-synchronous processing. Electroglottograph (EGG) recordings give a measure of the electrical conductance of the glottis, providing a signal which is proportional to its contact area. EGG signals contain little noise or distortion, providing a good reference from which GCIs can be extracted to evaluate GCI estimation from speech recordings. Many approaches impose a threshold on the differentiated EGG signal which provide accurate results during voiced speech but are prone to errors at the onset and end of voicing; modern algorithms use a similar approach across multiple dyadic scales using the stationary wavelet transform. This paper describes a new method for EGG-based GCI estimation named SIGMA, which is based upon the stationary wavelet transform, peak detection with a group delay function and Gaussian Mixture Modelling for discrimination between true and false GCI candidates. In most real world environments, it is necessary to estimate GCIs from a speech signal recorded with a microphone placed at some distance from the talker. The presence of reverberation, noise and filtering by the vocal tract render GCI detection from real speech signals relatively difficult to achieve compared with the EGG, so EGG-based references have often been used to evaluate GCI detection from speech signals. Evaluation against 500 handlabelled sentences has shown an accuracy of 99.35%, a 4.7% improvement over a popular existing method.
|Published in||Proc. European Signal Processing Conf. (EUSIPCO)|