Clustering

Clustering is the problem of finding a “good”’’ partition of a set of data points in d space into clusters (groups, each consisting of “nearby”’’ points). In the worst-case, the problem is hard. This can be overcome in two ways: Assuming stochastic models of data, the “correct’’” clustering can be found. The last decade has seen many detailed results in this setting. A second alternative is to make no stochastic assumptions, but settle for non-optimal solutions. The workhorse of this approach is Lloyd’s k-means algorithm. After surveying both alternatives, I will describe a recent result (joint work with Amit Kumar) which, qualitatively, seems to provide the best of both worlds.

Speaker Details

Ravindran Kannan is a principal researcher at Microsoft Research India, where he leads the Algorithms Research Group. He is also the first adjunct faculty of the Department of Computer Science and Automation at the Indian Institute of Science.

Date:
Speakers:
Ravi Kannan
Affiliation:
Microsoft Research Lab