Investigations on Hessian-Free Optimization for Cross-Entropy Training of Deep Neural Networks

Context-dependent deep neural network HMMs have been

shown to achieve recognition accuracy superior to Gaussian

mixture models in a number of recent works. Typically, neural

networks are optimized with stochastic gradient descent.

On large datasets, stochastic gradient descent improves quickly

during the beginning of the optimization. But since it does not

make use of second order information, its asymptotic convergence

behavior is slow. In regions with pathological curvature,

stochastic gradient descent may almost stagnate and thereby

falsely indicate convergence. Another drawback of stochastic

gradient descent is that it can only be parallelized within minibatches.

The Hessian-free algorithm is a second order batch optimization

algorithm that does not suffer from these problems.

In a recent work, Hessian-free optimization has been applied

to a training of deep neural networks according to a sequence

criterion. In that work, improvements in accuracy and training

time have been reported. In this paper, we analyze the properties

of the Hessian-free optimization algorithm and investigate

whether it is suited for cross-entropy training of deep neural

networks as well.

is2013_wiesler.final.pdf
PDF file

In  Interspeech

Details

TypeInproceedings
> Publications > Investigations on Hessian-Free Optimization for Cross-Entropy Training of Deep Neural Networks