Approximate Clustering without the Approximation

There has been substantial work on approximation algorithms for clustering data under distance-based objective functions such as k-median, k-means, and min-sum objectives. This work is fueled in part by the hope that approximating these objectives well will indeed yield more accurate solutions. That is, for problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct “target” clustering and the implicit assumption is that clusterings that are approximately optimal in terms of these distance-based measures are also approximately correct in terms of error with respect to the target. In this work we show that if we make this implicit assumption explicit — that is, if we assume that any c-approximation to the given clustering objective Phi is epsilon-close to the target — then we can produce clusterings that are O(epsilon)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for the k-median, k-means, and min-sum objectives, we show that we can achieve this guarantee for any constant c 1.

Speaker Details

Maria Florina Balcan is an assistant professor in the School of Computer Science at Georgia Institute of Technology. She received her Ph.D. in Computer Science from Carnegie Mellon University under the supervision of Avrim Blum. From October 2008 until July 2009, she was a postdoc at Microsoft Research, New England. Her main research interests are Computational and Statistical Machine Learning, Computational Aspects in Economics and Game Theory, and Algorithms. She is a recipient of the CMU SCS Distinguished Dissertation Award and of an NSF CAREER Award.

Date:
Speakers:
Maria Florina Balcan
Affiliation:
Georgia Institute of Technology
    • Portrait of Jeff Running

      Jeff Running