Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Context sensitive information: Which bits matter in data?

Speaker  Joachim Buhmann

Affiliation  ETH Zurich

Host  Pushmeet Kohli

Duration  00:51:06

Date recorded  21 August 2012

Learning patterns in data requires to extract interesting, statistically significant regularities in (large) data sets, e.g. detection of cancer cells in tissue microarrays and estimating their staining or role mining in security permission management. Admissible solutions or hypotheses specify the context of pattern analysis problems which have to cope with model mismatch and noise in data. An information theoretic approach is developed which estimates the precision of inferred solution sets and regularizes solutions in a noise adapted way. The tradeoff between "informativeness" and "robustness" is mirrored by the balance between high information content and identifiability of solution sets, thereby giving rise to a new notion of context sensitive information. Cost function to rank solutions and, more abstractly, algorithms are considered as noisy channels with a generalization capacity. The effectiveness of this concept is demonstrated by model validation for spectral clustering based on different variants of graph cuts. The concept also enables us to measure how many bit are extracted by sorting algorithms when the input and thereby the pairwise comparisons are subject to fluctuations.

©2012 Microsoft Corporation. All rights reserved.
> Context sensitive information: Which bits matter in data?