Speaker Ali Rahimi
Affiliation Intel Labs Berkeley
Host John Platt
Date recorded 24 July 2009
A popular trend in computer vision, graphics, and machine learning is to replace sophisticated statistical models with simpler generic ones, and to compensate for the missing domain knowledge with huge datasets. These huge datasets in turn require us to solve huge numerical optimization problems that tax popular off-the-shelf implementations of popular algorithms. I describe a randomized way to solve these large scale optimization problems quickly, in a few lines of code, and with provably good performance guarantees. For example, a randomized version of Adaboost and of kernelized Support Vector Machine can fit millions of data points within a few minutes with almost no loss in classification accuracy. Similarly, very large Semi-Definite Programs can be solved quickly by approximating them with suitably randomized linear programs. All of these tricks randomize over most of the variables of optimization and carry out a much cheaper optimization over the remaining variables. A theoretical analysis of these tricks relies on the concentration of measure phenomenon in Banach spaces, and guarantees that in the cases I describe, these tricks work almost as well as carrying out the full optimization. I'll also describe a curious and related way of constructing numerical algorithms that execute reliably on unreliable CPUs that run at subthreshold voltages.
©2009 Microsoft Corporation. All rights reserved.
People also watched