Acoustic Modeling

Established: January 29, 2004

Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc.

Acoustic modeling also encompasses “pronunciation modeling”, which describes how a sequence or multi-sequences of fundamental speech units (such as phones or phonetic feature) are used to represent larger speech units such as words or phrases which are the object of speech recognition. Acoustic modeling may also include the use of feeback information from the recognizer to reshape the feature vectors of speech in achieving noise robustness in speech recognition.

Speech recognition engines usually require two basic components in order to recognize speech. One component is an acoustic model, created by taking audio recordings of speech and their transcriptions and then compiling them into statistical representations of the sounds for words. The other component is called a language model, which gives the probabilities of sequences of words.  Language models are often used for dictation applications. A special type of langauge models is regular grammars, which are used typically in desktop command and control or telephony IVR-type applications.

Our group have been working on acoustic modeling since its inception due to its critical importance in speech technology, speech recognition in particular. We have world-class expertise and researchers in this area of research. Recently, we have been focusing on two aspects of acoustic modeling: 1) how to establish the statistical models and their structures; and 2) how to learn the model parameters automatically from the data. The following are some of our recent projects in the area of acoustic modeling:

  • Discriminative Learning Algorithms and Procedures for Acoustic Models of Speech
  • Large-Margin Learning of HMM Parameters
  • Discriminative pronunciation modeling
  • Joint discriminative learning of SLU and SR model parameters using N-best//lattice results from speech recognizer
  • Discriminative acoustic models for Speech Recognition via the use of continuous features in CRF and HCRF
  • Acoustic feature enhancement by statistical mothods with feedbacks from speech recognition
  • Compressing HMM parameters for adaptive noise-robust speech recognition
  • Noise-adaptive and speaker-adaptive training of HMM parameters
  • Parametric modeling of acoustic environment with mixing phases between speech and noise for speech recogntion
  • Multilingual and cross-lingual speech recognition
  • Cross-Lingual Speech Recognition under Runtime Resource Constraints
  • Modeling speech production mechanisms for speech recognition: hidden dynamic modeling; minimum-effort principle for model learning and decoding
  • Acoustic modeling for casual speech for enhanced voicemail
  • Active learning for speech recognition
  • Unsupervised learning for speech recognition
  • Variable-Parameter HMMs
  • Acoustic modeling for voice search