Dhruv Mahajan, S. Sathiya Keerthi, Sundararajan Sellamanickam, and Leon Bottou
This paper proposes a novel parallel stochastic gradient descent (SGD) method that is obtained by applying parallel sets of SGD iterations (each set operating on one node using the data residing in it) for finding the direction in each iteration of a batch descent method. The method has strong convergence properties. Experiments on datasets with high dimensional feature spaces show the value of this method.
|Publisher||NIPS 2013 Workshop on Optimization for Machine Learning|