Deep Neural Networks for Acoustic Modeling in Speech Recognition

Most current speech recognition systems

use hidden Markov models (HMMs) to deal

with the temporal variability of speech and

Gaussian mixture models (GMMs) to

determine how well each state of each

HMM fits a frame or a short window of frames of coefficients

that represents the acoustic input. An alternative way to evaluate

the fit is to use a feed-forward neural network that takes

several frames of coefficients as input and produces posterior

probabilities over HMM states as output. Deep neural networks

(DNNs) that have many hidden layers and are trained

using new methods have been shown to outperform GMMs on

a variety of speech recognition benchmarks, sometimes by a

large margin. This article provides an overview of this progress

and represents the shared views of four research groups that

have had recent successes in using DNNs for acoustic modeling

in speech recognition.

HintonDengYuEtAl-SPM2012.pdf
PDF file

In  IEEE Signal Processing Magazine

Details

TypeArticle
URLhttp://psych.stanford.edu/~jlm/pdfs/Hinton12IEEE_SignalProcessingMagazine.pdf
Pages82-97
Volume29
Number6
Share
Share this page on Facebook
Share this page on Twitter
Share this page on LinkedIn
E-mail this page
RSS feeds
> Publications > Deep Neural Networks for Acoustic Modeling in Speech Recognition