Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath,, and Brian Kingsbury
November 2012
Most current speech recognition systems
use hidden Markov models (HMMs) to deal
with the temporal variability of speech and
Gaussian mixture models (GMMs) to
determine how well each state of each
HMM fits a frame or a short window of frames of coefficients
that represents the acoustic input. An alternative way to evaluate
the fit is to use a feed-forward neural network that takes
several frames of coefficients as input and produces posterior
probabilities over HMM states as output. Deep neural networks
(DNNs) that have many hidden layers and are trained
using new methods have been shown to outperform GMMs on
a variety of speech recognition benchmarks, sometimes by a
large margin. This article provides an overview of this progress
and represents the shared views of four research groups that
have had recent successes in using DNNs for acoustic modeling
in speech recognition.
![]() PDF file |
In IEEE Signal Processing Magazine
| Type | Article |
| URL | http://psych.stanford.edu/~jlm/pdfs/Hinton12IEEE_SignalProcessingMagazine.pdf |
| Pages | 82-97 |
| Volume | 29 |
| Number | 6 |