Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > Conjugate Directions for Stochastic Gradient Descent
Conjugate Directions for Stochastic Gradient Descent

The method of conjugate directions provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent. The experiments are restricted to the linear, realisable case.

schgra02.ps.gz
File

In: Proceedings of the International Conference on Neural Networks, ICANN 2002

Publisher: Springer

Details

Type: Inproceedings
Pages: 1351–1356
Number: 2415
Series: Lecture Notes in Computer Science