Machine Learning Day 2013 - Deep Learning (but not the kind you were thinking of); A Bayesian Information Criterion for Singular Models

Speaker  Ran Gilad-Bachrach and Mathias Drton

Host  Ofer Dekel

Affiliation  MSR, University of Washington

Duration  01:07:51

Date recorded  18 October 2013

Typically, one approaches a supervised machine learning problem by writing down an objective function and finding a hypothesis that minimizes it. This is equivalent to finding the Maximum A Posteriori (MAP) hypothesis for a Boltzmann distribution. However, MAP is not a robust statistic. As an alternative, we define the depth of hypotheses and show that generalization and robustness can be bounded as a function of this depth. Therefore, we suggest using the median hypothesis, which is a deep hypothesis, and present algorithms for approximating it.

One contribution of this work is an efficient method for approximating the Tukey median. The Tukey median, which is often used for data visualization and outlier detection, is a special case of the family of medians we define: however, computing it exactly is exponentially slow in the dimension. Our algorithm approximates such medians in polynomial time while making weaker assumptions than those required by previous work.

The presentation is based on a joint work with Chris Burges.

The Bayesian Information Criterion (BIC) is a widely used model selection technique that is inspired by the large-sample asymptotic behavior of Bayesian approaches to model selection. In this talk we will consider such approximate Bayesian model choice for problems that involve models whose Fisher-information matrices may fail to be invertible along other competing submodels. When models are singular in this way, the penalty structure in BIC generally does not reflect the large-sample behavior of their Bayesian marginal likelihood. While large-sample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. Guided by examples such as determining the number of components of mixture models, the number of factors in latent factor models or the rank in reduced-rank regression, we propose a resolution to this paradox and give a practical extension of BIC for singular model selection problems.

Joint work with Martyn Plummer.

©2013 Microsoft Corporation. All rights reserved.
> Machine Learning Day 2013 - Deep Learning (but not the kind you were thinking of); A Bayesian Information Criterion for Singular Models