H. Attias, L. Lee, and Li Deng
This paper describes novel and powerful variational EM al- gorithms for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech pro- duction. Hidden dynamic models (HDMs) have recently become a class of promising acoustic models to incorporate crucial speech-speci¯c knowledge and overcome many inher- ent weaknesses of traditional HMMs. However, the lack of powerful and e±cient statistical learning algorithms is one of the main obstacles preventing them from being well stud- ied and widely used. Since exact inference and learning are intractable, a variational approach is taken to develop ef- fective approximate algorithms. We have implemented the segmental constraint crucial for modeling speech dynamics and present algorithms for recovering hidden speech dy- namics and discrete speech units from acoustic data only. The e®ectiveness of the algorithms developed are veri¯ed by experiments on simulation and Switchboard speech data.
|Published in||Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing|