A Reliable Effective Terascale Linear Learning System

Speaker  John Langford

Host  Heather Warncke

Affiliation  MSR-NYC

Duration  00:46:40

Date recorded  15 May 2012

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features (the number of features here refers to the number of non-zero entries in the data matrix), billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques is new, but the careful synthesis required to obtain an efficient implementation is a novel contribution. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature. We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.

©2012 Microsoft Corporation. All rights reserved.
> A Reliable Effective Terascale Linear Learning System