Near Optimal LP Rounding for Correlation Clustering on Complete Graphs

Introduced about 10 years ago by Bansal, Blum and Chawla, correlation clustering has become one of the standard techniques in machine learning and data mining. This due to several advantages of correlation clustering as compared to other standard clustering methods (e.g. k-means): – Correlation clustering only requires qualitative information about similarities between objects. This makes it applicable in scenarios such as crowdsourced duplicate finding when information about similarities between objects is generated by humans. – Correlation clustering doesn’t require the number of clusters to be specified in advance, producing the number of clusters that best fits the data.

We give new rounding schemes for the standard linear programming relaxation of the correlation clustering problem, achieving approximation factors almost matching the integrality gaps: – For complete graphs our approximation is 2.06-ε for a fixed constant ε, which almost matches the previously known integrality gap of 2. – For complete k-partite graphs our approximation is 3. We also show a matching integrality gap. – For complete graphs with edge weights satisfying triangle inequalities and probability constraints, our approximation is 1.5, and we show an integrality gap of 1.2.

Our results improve a long line of work on approximation algorithms for correlation clustering in complete graphs, previously culminating in a ratio of 2.5 for the complete case by Ailon, Charikar and Newman (JACM’08). In the weighted complete case satisfying triangle inequalities and probability constraints, the same authors give a 2-approximation; for the bipartite case, Ailon, Avigdor-Elgrabli, Liberty and van Zuylen give a 4-approximation (SICOMP’12).

Speaker Details

Grigory Yaroslavtsev is a postdoctoral fellow at the Warren Center for Network and Data Sciences. He was previously a Postdoctoral Fellow in Mathematics at Brown University, ICERM. He received his Ph.D. in Theoretical Computer Science in 2013 from Pennsylvania State University and an M.Sc. in Applied Mathematics and Physics from the Academic University of the Russian Academy of Sciences in 2010.

Grigory works on efficient algorithms for sparsification, summarization and testing properties of large data, including approximation, parallel and online algorithms, learning theory and property testing, communication and information complexity and private data release.