Stochastic Gradient Descent Algorithm in the Computational Network Toolkit

  • Brian Guenter ,
  • Dong Yu ,
  • Adam Eversole ,
  • Oleksii Kuchaiev ,
  • Mike Seltzer

OPT2013: NIPS Workshop on Optimization for Machine Learning |

We introduce the stochastic gradient descent algorithm used in the computational network toolkit (CNTK) — a general purpose machine learning toolkit written in C++ for training and using models that can be expressed as a computational network. We describe the algorithm used to compute the gradients automatically for a given network. We also propose a low-cost automatic learning rate selection algorithm and demonstrate that it works well in practice.