A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing

Li Deng, X. Cui, R. Pruvenok, J. Huang, S. Momen, Y. Chen, and A. Alwan


While vocal tract resonances (VTRs, or formants that are defined as

such resonances) are known to play a critical role in human speech

perception and in computer speech processing, there has been a lack

of standard databases needed for the quantitative evaluation of automatic

VTR extraction techniques. We report in this paper on our

recent effort to create a publicly available database of the first three

VTR frequency trajectories. The database contains a representative

subset of the TIMIT corpus with respect to speaker, gender, dialect

and phonetic context, with a total of 538 sentences. A Matlab-based

labeling tool is developed, with high-resolution wideband spectrograms

displayed to assist in visual identification of VTR frequency

values which are then recorded via mouse clicks and local spline interpolation.

Special attention is paid to VTR values during consonantto-

vowel (CV) and vowel-to-consonant (VC) transitions, and to speech

segments with vocal tract anti-resonances. Using this database, we

quantitatively assess two common automatic VTR tracking techniques

in terms of their average tracking errors analyzed within each

of the six major broad phonetic classes as well as during CV and VC

transitions. The potential use of the VTR database for research in

several areas of speech processing is discussed.


Publication typeInproceedings
Published inProc. of the Int. Conf. on Acoustics, Speech, and Signal Processing
