Mikhail Bilenko


Research   |   Personal   |   Contact

Update: I lead the Machine Learning Algorithms team in Cloud+Enterprise division. Our ML tools are used in many products, from Microsoft Azure ML to numerous others across the company, and we collaborate extensively with MSR and applied ML/Data Science groups. If you love both ML fundamentals and coding, and would enjoy working on creating state-of-the-art ML algorithms and systems with a fun group of excellent engineers and scientists, please ping me.

Before that, I was a researcher in the Machine Learning Department at Microsoft Research. I enjoy building ML systems and tools, and working on large-scale prediction problems on behavioral, transactional and textual data. Specific applications on which I focused recently are high-throughput ML, click probability prediction, relevant advertisement selection, constructing user profiles for targeting, and improving search relevance by mining logs of browsing behavior. In the past, I worked on semi-supervised clustering and record linkage (entity resolution, de-duplication, etc.). I am generally interested in adaptive similarity/distance functions, implementing learning algorithms on parallel/distributed platforms, and creating tools for machine learning practitioners.

I completed my Ph.D. in the Department of Computer Science at the University of Texas at Austin in 2006, where I was a member of the Machine Learning Group. Along the way, I spent the summer of 2002 at IBM T.J. Watson Research Center, and the summer/fall of 2004 at Google.


  • Learning from large datasets
  • Learnable similarity functions and their applications in information integration (e.g., record linkage/identity uncertainty) and text mining

  • Semi-supervised clustering

    • Probabilistic Semi-Supervised Clustering with Constraints
      Sugato Basu, Mikhail Bilenko, Arindam Banerjee, and Raymond J. Mooney. In Semi-Supervised Learning, O. Chapelle, B. Schölkopf, and A. Zien (eds.), MIT Press, 2006.
      Note: this chapter summarizes the KDD and ICML papers below

    • A Probabilistic Framework for Semi-Supervised Clustering
      Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), pp.59-68, Seattle, WA, August 2004.
      (Winner of Best Research Paper Award)

    • Integrating Constraints and Metric Learning in Semi-Supervised Clustering
      Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney. In Proceedings of the 21st International Conference on Machine Learning (ICML-2004), pp.81-88, Banff, Canada, July 2004.

    • A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields
      Mikhail Bilenko and Sugato Basu. In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004), pp.17-22, Banff, Canada, July 2004.

  • Indirect learning in information integration (record linkage, information extraction), text classification, and clustering

    • Two Approaches to Handling Noisy Variation in Text Mining
      Un Yong Nahm, Mikhail Bilenko, and Raymond J. Mooney. In Proceedings of the ICML-2002 Workshop on Text Learning (TextML'2002), pp.18-27, Sydney, Australia, July 2002.

In my leisure time I enjoy applying hill-climbing search and gradient descent algorithms to real-world domains, which are almost as cool as the cool stuff that my sister does.
Contact Info

Email mbilenko@microsoft.com
Postal Microsoft Research
One Microsoft Way
Redmond, WA 98052