Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc.
Acoustic modeling also encompasses "pronunciation modeling", which describes how a sequence or multi-sequences of fundamental speech units (such as phones or phonetic feature) are used to represent larger speech units such as words or phrases which are the object of speech recognition. Acoustic modeling may also include the use of feeback information from the recognizer to reshape the feature vectors of speech in achieving noise robustness in speech recognition.
Speech recognition engines usually require two basic components in order to recognize speech. One component is an acoustic model, created by taking audio recordings of speech and their transcriptions and then compiling them into statistical representations of the sounds for words. The other component is called a language model, which gives the probabilities of sequences of words. Language models are often used for dictation applications. A special type of langauge models is regular grammars, which are used typically in desktop command and control or telephony IVR-type applications.
Our group have been working on acoustic modeling since its inception due to its critical importance in speech technology, speech recognition in particular. We have world-class expertise and researchers in this area of research. Recently, we have been focusing on two aspects of acoustic modeling: 1) how to establish the statistical models and their structures; and 2) how to learn the model parameters automatically from the data. The following are some of our recent projects in the area of acoustic modeling:
- Discriminative Learning Algorithms and Procedures for Acoustic Models of Speech
- Large-Margin Learning of HMM Parameters
- Discriminative pronunciation modeling
- Joint discriminative learning of SLU and SR model parameters using N-best/lattice results from speech recognizer
- Discriminative acoustic models for Speech Recognition via the use of continuous features in CRF and HCRF
- Acoustic feature enhancement by statistical mothods with feedbacks from speech recognition
- Compressing HMM parameters for adaptive noise-robust speech recognition
- Noise-adaptive and speaker-adaptive training of HMM parameters
- Parametric modeling of acoustic environment with mixing phases between speech and noise for speech recogntion
- Multilingual and cross-lingual speech recognition
- Cross-Lingual Speech Recognition under Runtime Resource Constraints
- Modeling speech production mechanisms for speech recognition: hidden dynamic modeling; minimum-effort principle for model learning and decoding
- Acoustic modeling for casual speech for enhanced voicemail
- Active learning for speech recognition
- Unsupervised learning for speech recognition
- Variable-Parameter HMMs
- Acoustic modeling for voice search
- George Dahl, Dong Yu, Li Deng, and Alex Acero, Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition, in IEEE Transactions on Audio, Speech, and Language Processing, Special Issue on Deep Learning for Speech and Langauge Processing, vol. 20, no. 1, pp. 30-42, January 2012
- Frank Seide, Gang Li, and Dong Yu, Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, in Interspeech 2011, International Speech Communication Association, August 2011
- G. Dahl, Dong Yu, Li Deng, and Alex Acero, Large Vocabulary Continuous Speech Recognition With Context-Dependent DBN-HMMS, in Proc. ICASSP, Prague, IEEE, May 2011
- Dong Yu and Li Deng, Deep Learning and Its Applications to Signal and Information Processing , in IEEE Signal Processing Magazine, IEEE, January 2011
- Dong Yu, Li Deng, and George E. Dahl, Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition, in NIPS 2010 workshop on Deep Learning and Unsupervised Feature Learning, December 2010
- Dong Yu, Li Deng, and Shizhen Wang, Learning in the Deep-Structured Conditional Random Fields, in NIPS 2009 Workshop on Deep Learning for Speech Recognition and Related Applications, December 2009
- Dong Yu, Li Deng, and Alex Acero, Using continuous features in the maximum entropy model, in Pattern Recognition Letters, vol. 30, no. 8, pp. 1295-1300, Elsevier , October 2009
- Dong Yu, Li Deng, Yifan Gong, and Alex Acero, A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models, in IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 7, pp. 1348-1360, IEEE, September 2009
- Dong Yu, Li Deng, and Alex Acero, Hidden Conditional Random Field with Distribution Constraints for Phone Classification, in Interspeech 2009, International Speech Communication Association, September 2009
- Dong Yu and Li Deng, Solving nonlinear estimation problems using Splines , in IEEE Signal Processing Magazine, vol. 26, no. 4, pp. 86-90, IEEE, July 2009
- Balakrishnan Varadarajan, Dong Yu, Li Deng, and Alex Acero, Using collective information in semi-supervised learning for speech recognition, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Dong Yu, Li Deng, Peng Liu, Jian Wu, Yifan Gong, and Alex Acero, Cross-lingual speech recognition under run-time resource constraints, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Ozlem Kalinli, Michael L. Seltzer, and Alex Acero, Noise Adaptive Training Using a Vector Taylor Series Approach for Robust Automatic Speech Recognition, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Institute of Electrical and Electronics Engineers, Inc., Taipei, Taiwan, April 2009
- Hui Lin, Li Deng, Dong Yu, Yifan Gong, Alex Acero, and Chi-Hui Lee, A Study on Multilingual Acoustic Modeling For Large Vocabulary ASR, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Jasha Droppo and Alex Acero, Experimenting with a Global Decision Tree for State Clustering in Automatic Speech Recognition Systems, in ICASSP 2009, IEEE, April 2009
- Oriol Vinyals, Li Deng, Dong Yu, and Alex Acero, Discriminative pronunciation learning using phonetic decoder and minimum classification error criterion, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Balakrishnan Varadarajan, Dong Yu, Li Deng, and Alex Acero, Maximizing global entry reduction for active learning in speech recognition, in Proceedings of the ICASSP, Institute of Electrical and Electronics Engineers, Inc., April 2009
- Jinyu Li, Dong Yu, Li Deng, Yifan Gong, and Alex Acero, A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions, in Computer Speech and Language, vol. 23, pp. 389-405, Elsevier , 2009
- Dong Yu, Balakrishnan Varadarajan, Li Deng, and Alex Acero, Active Learning and Semi-supervised Learning for Speech Recognition: A Unified Framework using the Global Entropy Reduction Maximization Criterion, in Computer Speech and Language - Special Issue on Emergent Artificial Intelligence Approaches for Pattern Recognition in Speech and Language Processing , Elsevier , 2009
- Hui Lin, Li Deng, Jasha Droppo, Dong Yu, and Alex Acero, Learning Methods in Multilingual Speech Recognition, in NIPS Workshop, Whistler, BC, Canada, Microsoft, December 2008
- Dong Yu, Li Deng, and Alex Acero, The Maximum Entropy Model with Continuous Features , in NIPS Workshop, Whistler, BC, Canada, Microsoft, December 2008
- Xiaodong He and Li Deng, DISCRIMINATIVE LEARNING FOR SPEECH RECOGNITION: Theory and Practice, Morgan & Claypool, October 2008
- Dong Yu, Li Deng, Xiaodong He, and Alex Acero, Large-Margin Minimum Classification Error Training: A Theoretical Risk Minimization Perspective, in Computer Speech and Language, vol. 22, no. 4, pp. 415-429, Elsevier , October 2008
- Dong Yu, Li Deng, Yifan Gong, and Alex Acero, Parameter Clustering and Sharing in Variable-Parameter HMMs for Noise Robust Speech Recognition, in Proc. of the Interspeech, International Speech Communication Association, September 2008
- Xiaodong He, Li Deng, and Wu Chou, Discriminative Learning in Sequential Pattern Recognition --- A Unifying Review for Optimization-Oriented Speech Recognition, in IEEE Signal Processing Magazine, vol. 25, no. 5, pp. 14-36, Institute of Electrical and Electronics Engineers, Inc., September 2008
