Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc.
Acoustic modeling also encompasses "pronunciation modeling", which describes how a sequence or multi-sequences of fundamental speech units (such as phones or phonetic feature) are used to represent larger speech units such as words or phrases which are the object of speech recognition. Acoustic modeling may also include the use of feeback information from the recognizer to reshape the feature vectors of speech in achieving noise robustness in speech recognition.
Speech recognition engines usually require two basic components in order to recognize speech. One component is an acoustic model, created by taking audio recordings of speech and their transcriptions and then compiling them into statistical representations of the sounds for words. The other component is called a language model, which gives the probabilities of sequences of words. Language models are often used for dictation applications. A special type of langauge models is regular grammars, which are used typically in desktop command and control or telephony IVR-type applications.
Our group have been working on acoustic modeling since its inception due to its critical importance in speech technology, speech recognition in particular. We have world-class expertise and researchers in this area of research. Recently, we have been focusing on two aspects of acoustic modeling: 1) how to establish the statistical models and their structures; and 2) how to learn the model parameters automatically from the data. The following are some of our recent projects in the area of acoustic modeling:
- Discriminative Learning Algorithms and Procedures for Acoustic Models of Speech
- Large-Margin Learning of HMM Parameters
- Discriminative pronunciation modeling
- Joint discriminative learning of SLU and SR model parameters using N-best/lattice results from speech recognizer
- Discriminative acoustic models for Speech Recognition via the use of continuous features in CRF and HCRF
- Acoustic feature enhancement by statistical mothods with feedbacks from speech recognition
- Compressing HMM parameters for adaptive noise-robust speech recognition
- Noise-adaptive and speaker-adaptive training of HMM parameters
- Parametric modeling of acoustic environment with mixing phases between speech and noise for speech recogntion
- Multilingual and cross-lingual speech recognition
- Cross-Lingual Speech Recognition under Runtime Resource Constraints
- Modeling speech production mechanisms for speech recognition: hidden dynamic modeling; minimum-effort principle for model learning and decoding
- Acoustic modeling for casual speech for enhanced voicemail
- Active learning for speech recognition
- Unsupervised learning for speech recognition
- Variable-Parameter HMMs
- Acoustic modeling for voice search
- Dong Yu, Li Deng, Yifan Gong, and Alex Acero, A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models, in IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 7, pp. 1348-1360, IEEE, September 2009
- Dong Yu, Li Deng, and Alex Acero, Hidden Conditional Random Field with Distribution Constraints for Phone Classification, in Interspeech 2009, International Speech Communication Association, September 2009
- Dong Yu, Li Deng, and Alex Acero, Using continuous features in the maximum entropy model, in Pattern Recognition Letters, vol. 30, no. 8, pp. 1295-1300, June 2009
- Ozlem Kalinli, Michael L. Seltzer, and Alex Acero, Noise Adaptive Training Using a Vector Taylor Series Approach for Robust Automatic Speech Recognition, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Institute of Electrical and Electronics Engineers, Inc., Taipei, Taiwan, April 2009
- Balakrishnan Varadarajan, Dong Yu, Li Deng, and Alex Acero, Maximizing global entry reduction for active learning in speech recognition, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Hui Lin, Li Deng, Dong Yu, Yifan Gong, Alex Acero, and Chi-Hui Lee, A Study on Multilingual Acoustic Modeling For Large Vocabulary ASR, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Balakrishnan Varadarajan, Dong Yu, Li Deng, and Alex Acero, Using collective information in semi-supervised learning for speech recognition, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Oriol Vinyals, Li Deng, Dong Yu, and Alex Acero, Discriminative pronunciation learning using phonetic decoder and minimum classification error criterion, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Dong Yu, Li Deng, Peng Liu, Jian Wu, Yifan Gong, and Alex Acero, Cross-lingual speech recognition under run-time resource constraints, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Jasha Droppo and Alex Acero, Experimenting with a Global Decision Tree for State Clustering in Automatic Speech Recognition Systems, IEEE, April 2009
- Dong Yu, Balakrishnan Varadarajan, Li Deng, and Alex Acero, Active Learning and Semi-supervised Learning for Speech Recognition: A Unified Framework using the Global Entropy Reduction Maximization Criterion, in Computer Speech and Language - Special Issue on Emergent Artificial Intelligence Approaches for Pattern Recognition in Speech and Language Processing , Elsevier , 2009
- Jinyu Li, Li Deng, Yifan Gong, and Alex Acero, A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions, in Computer Speech and Language, vol. 23, pp. 389-405, Elsevier , 2009
- Dong Yu, Li Deng, and Alex Acero, Using Continuous Features in the Maximum Entropy Model, in Pattern Recognition Letters , Elsevier , 2009
- Dong Yu, Li Deng, and Alex Acero, The Maximum Entropy Model with Continuous Features , in NIPS Workshop, Whistler, BC, Canada, Microsoft, December 2008
- Hui Lin, Li Deng, Jasha Droppo, Dong Yu, and Alex Acero, Learning Methods in Multilingual Speech Recognition, in NIPS Workshop, Whistler, BC, Canada, Microsoft, December 2008
- Dong Yu, Li Deng, Xiaodong He, and Alex Acero, Large-Margin Minimum Classification Error Training: A Theoretical Risk Minimization Perspective, in Computer Speech and Language, vol. 22, no. 4, pp. 415-429, Elsevier , October 2008
- Dong Yu, Li Deng, Yifan Gong, and Alex Acero, Discriminative Training of Variable-Parameter HMMs for Noise Robust Speech Recognition, in Proceedings of the Interspeech, International Speech Communication Association, September 2008
- Dong Yu, Li Deng, Yifan Gong, and Alex Acero, Parameter Clustering and Sharing in Variable-Parameter HMMs for Noise Robust Speech Recognition, in Proc. of the Interspeech, International Speech Communication Association, September 2008
- Xiaolong Li, Li Deng, Yun-Cheng Ju, and Alex Acero, Automatic Children's Reading Tutor on Hand-Held Devices, in Proceedings of Interspeech, International Speech Communication Association, Brisbane, Australia, September 2008
- Xiaodong He, Li Deng, and Chou Wu, Discriminative Learning in Sequential Pattern Recognition --- A Unifying Review for Optimization-Oriented Speech Recognition, in IEEE Signal Processing Magazine, vol. 25, no. 5, pp. 14-36, Institute of Electrical and Electronics Engineers, Inc., September 2008
- Dong Yu, Li Deng, Jasha Droppo, Jian Wu, Yifan Gong, and Alex Acero, Robust speech recognition using cepstral minimum-mean-square-error noise suppressor, in IEEE Trans. Audio, Speech, and Language Processing, vol. 16, no. 5, Institute of Electrical and Electronics Engineers, Inc., July 2008
- Jinyu Li, Li Deng, Dong Yu, Yifan Gong, and Alex Acero, HMM Adaptation Using a Phase-Sensitive Acoustic Distortion Model for Environment-Robust Speech Recognition, Institute of Electrical and Electronics Engineers, Inc., April 2008
- Tsung-Hui Chang, Zhi-Quan Luo, Li Deng, and Chong-Yung Chi, A Convex Optimization Method for Joint Mean and Variance Parameter Estimation of Large-Margin CDHMM, in Proceedings of the ICASSP, April 2008
- Dong Yu, Li Deng, Jasha Droppo, Jian Wu, Yifan Gong, and Alex Acero, A Minimum Mean-Square-Error Noise Reduction Algorithm on Mel-Frequency Cepstra for Robust Speech Recognition, in Proc. ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2008
- Jinyu Li, Li Deng, Dong Yu, Jian Wu, Yifan Gong, and Alex Acero, Adaptation of compressed HMM parameters for resource-constrained speech recognition, Institute of Electrical and Electronics Engineers, Inc., April 2008



