Li Deng, X. Cui, R. Pruvenok, J. Huang, S. Momen, Y. Chen, and A. Alwan
May 2006
While vocal tract resonances (VTRs, or formants that are defined as
such resonances) are known to play a critical role in human speech
perception and in computer speech processing, there has been a lack
of standard databases needed for the quantitative evaluation of automatic
VTR extraction techniques. We report in this paper on our
recent effort to create a publicly available database of the first three
VTR frequency trajectories. The database contains a representative
subset of the TIMIT corpus with respect to speaker, gender, dialect
and phonetic context, with a total of 538 sentences. A Matlab-based
labeling tool is developed, with high-resolution wideband spectrograms
displayed to assist in visual identification of VTR frequency
values which are then recorded via mouse clicks and local spline interpolation.
Special attention is paid to VTR values during consonantto-
vowel (CV) and vowel-to-consonant (VC) transitions, and to speech
segments with vocal tract anti-resonances. Using this database, we
quantitatively assess two common automatic VTR tracking techniques
in terms of their average tracking errors analyzed within each
of the six major broad phonetic classes as well as during CV and VC
transitions. The potential use of the VTR database for research in
several areas of speech processing is discussed.
![]() PDF file |
In Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing
| Type | Inproceedings |