Switching Dynamic System Models for Speech Articulation and Acoustics

Li Deng

Abstract

A statistical generative model for the speech process is described that

embeds a substantially richer structure than the HMM currently in predominant use for

automatic speech recognition. This switching dynamic-system model generalizes and

integrates the HMM and the piece-wise stationary nonlinear dynamic system (state-

space) model. Depending on the level and the nature of the switching in the model

design, various key properties of the speech dynamics can be naturally represented in

the model. Such properties include the temporal structure of the speech acoustics, its

causal articulatory movements, and the control of such movements by the multidimen-

sional targets correlated with the phonological (symbolic) units of speech in terms of

overlapping articulatory features.

One main challenge of using this multi-level switching dynamic-system model for

successful speech recognition is the computationally intractable inference (decoding) on

the posterior probabilities of the hidden states. This leads to computationally intractable

optimal parameter learning (training). Several versions of Bayesian networks have been

devised with detailed dependency implementation specified to represent the switching

dynamic-system model of speech. We discuss the variational technique developed for

general Bayesian networks as a suboptimal solution to the decoding and learning prob-

lems. Some common operations of estimating phonological states' switching times have

been shared between the variational technique and the human auditory function that

uses neural transient responses to detect temporal landmarks associated with phono-

logical features. This suggests that the variation-style learning may actually take place

in human speech perception under an encoding-decoding theory of speech communi-

cation which highlights the critical roles of modeling articulatory dynamics for speech

recognition and which forms a main motivation for the switching dynamic system model

described in this chapter.

Details

Publication typeInproceedings
Published inProc. of the IMA Workshop
> Publications > Switching Dynamic System Models for Speech Articulation and Acoustics