Machine learning is a vibrant field with many rich algorithmic techniques. However, most approaches in the field are heuristic: we cannot prove good bounds on either their performance or their running time, except in quite limited settings. This talk will focus on the project of designing algorithms and approaches whose performance can be analyzed rigorously.
As an example of this project, I will focus on my work on the nonnegative matrix factorization problem, which has important applications throughout machine learning (and theory). As is often the case, this problem is NP-hard when considered in full generality. However, we introduce a sub-case called 'separable' nonnegative matrix factorization that we believe is the right notion in various contexts. We give a polynomial time algorithm for this problem, and use this algorithm for more general problems involving learning topic models. This is an auspicious example where theory can lead to inherently new algorithms that have highly-practical performance on real data sets.
I will also briefly describe some of my other work on learning, including mixtures of Gaussians and robust linear regression, as well as promising directions for future work.