A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields

  • Mikhail Bilenko ,
  • Sugato Basu

Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004) |

Recently, a number of methods have been proposed for semi-supervised clustering that employ supervision in the form of pairwise constraints. We describe a probabilistic model for semisupervised clustering based on Hidden Markov Random Fields (HMRFs) that incorporates relational supervision. The model leads to an EMstyle clustering algorithm, the E-step of which requires collective assignment of instances to cluster centroids under the constraints. We evaluate three known techniques for such collective assignment: belief propagation, linear programming relaxation, and iterated conditional modes (ICM). The first two methods attempt to globally approximate the optimal assignment, while ICM is a greedy method. Experimental results indicate that global methods outperform the greedy approach when relational supervision is limited, while their benefits diminish as more pairwise constraints are provided