Switching Dynamic System Models for Speech Articulation and Acoustics

Li Deng

Abstract

A statistical generative model for the speech process is described that embeds a substantially richer structure than the HMM currently in predominant use for automatic speech recognition. This switching dynamic-system model generalizes and integrates the HMM and the piece-wise stationary nonlinear dynamic system (state- space) model. Depending on the level and the nature of the switching in the model design, various key properties of the speech dynamics can be naturally represented in the model. Such properties include the temporal structure of the speech acoustics, its causal articulatory movements, and the control of such movements by the multidimen- sional targets correlated with the phonological (symbolic) units of speech in terms of overlapping articulatory features. One main challenge of using this multi-level switching dynamic-system model for successful speech recognition is the computationally intractable inference (decoding) on the posterior probabilities of the hidden states. This leads to computationally intractable optimal parameter learning (training). Several versions of Bayesian networks have been devised with detailed dependency implementation specified to represent the switching dynamic-system model of speech. We discuss the variational technique developed for general Bayesian networks as a suboptimal solution to the decoding and learning prob- lems. Some common operations of estimating phonological states' switching times have been shared between the variational technique and the human auditory function that uses neural transient responses to detect temporal landmarks associated with phono- logical features. This suggests that the variation-style learning may actually take place in human speech perception under an encoding-decoding theory of speech communi- cation which highlights the critical roles of modeling articulatory dynamics for speech recognition and which forms a main motivation for the switching dynamic system model described in this chapter.

Details

Publication typeInproceedings
Published inProc. of the IMA Workshop
> Publications > Switching Dynamic System Models for Speech Articulation and Acoustics