Optimizing Trade-offs for Scalable Machine Learning

Speaker  Joseph Bradley

Host  Lin Xiao

Affiliation  Carnegie Mellon University

Duration  01:15:54

Date recorded  27 February 2013

Modern machine learning applications require large models, lots of data, and complicated optimization. I will discuss scaling machine learning by decomposing learning problems into simpler sub-problems. These decompositions allow us to trade off accuracy, computational complexity, and potential for parallelization, where a small sacrifice in one can mean a big gain in another. Moreover, we can tailor our decomposition to our model and data in order to optimize these trade-offs.

I will present two examples. First, I will discuss parallel optimization for sparse regression. Our Shotgun algorithm parallelizes coordinate descent, a seemingly sequential method. Shotgun theoretically achieves near-linear speedups and empirically is one of the fastest methods for multicore sparse regression. Second, I will discuss parameter learning for Probabilistic Graphical Models. Using structured composite likelihood, we prove finite sample complexity bounds for general Markov and Conditional Random Fields, and our bounds indicate how to choose estimator structure in practice. In both examples, our analysis provides strong theoretical guarantees which guide our very practical implementations.

©2013 Microsoft Corporation. All rights reserved.
> Optimizing Trade-offs for Scalable Machine Learning