|
Research |
Personal |
Contact
|
I am a researcher in the Machine Learning Department
at Microsoft Research. I am interested in learning problems that
arise in the context of large behavioral, transactional and textual datasets. Specific applications on which I worked recently
include click probability prediction, semantic ad matching, constructing user profiles for ad targeting, and improving search relevance
by mining logs of browsing behavior. In the past, I worked on semi-supervised clustering and record linkage (entity resolution,
de-duplication, etc.). I am generally interested in adaptive similarity/distance functions, and in implementing
learning algorithms on parallel/distributed platforms.
I completed my Ph.D. in the Department of Computer Sciences
at the University of Texas at Austin in 2006, where
I was a member of the Machine Learning Group. Along the way,
I spent the summer of 2002 at IBM T.J. Watson Research Center,
and the summer/fall of 2004 at Google.
|
- Learning from large datasets
- Fast Prediction of New Feature Utility
Hoyt Koepke and Mikhail Bilenko.
To appear in Proceedings of the 29th International Conference on Machine Learning
(ICML-2012), Edinburgh, Scotland, June 2012.
[PDF]
- Scaling Up Machine Learning. Edited by
Ron Bekkerman, Mikhail Bilenko, and
John Langford. Cambridge University Press, 2012.
- NIPS 2011 Workshop on Big Learning
- Predictive Client-side Profiles for Personalized Advertising
Mikhail Bilenko and Matthew Richardson.
In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD-2011), San Diego, August 2011.
[PDF]
[Slides (PPT)]
- SIGIR-2009 Workshop on Information Retrieval and Advertising
- Catching the Drift: Learning Broad Matches from Clickthrough Data
Sonal Gupta,
Mikhail Bilenko, and
Matthew Richardson.
Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD-2009), Paris, June 2009.
[PDF]
[Slides (PPT)]
- Enhancing Web Search by Promoting Multiple Search Engine Usage
Ryen W. White,
Matthew Richardson, Mikhail Bilenko, and
Allison Heath.
In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR-2008), Singapore, July 2008.
[PDF]
- Talking the Talk vs. Walking the Walk: Salience of Information Needs in Querying vs. Browsing
Mikhail Bilenko, Ryen W. White,
Matthew Richardson, and
G. Craig Murray.
In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR-2008), Singapore, July 2008.
[PDF]
- Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites From User Activity
Mikhail Bilenko and Ryen W. White.
In Proceedings of the 17th International World Wide Web Conference (WWW-2008), Beijing, April 2008.
[PDF]
- Leveraging Popular Destinations to Enhance Web Search Interaction
Ryen W. White, Mikhail Bilenko, and
Silviu Cucerzan.
ACM Transactions on the Web (TWEB), 2(3), pp.1-30, 2008.
[PDF]
- Earlier version: Studying the Use of Popular Destinations to Enhance Web Search Interaction
Ryen W. White, Mikhail Bilenko, and
Silviu Cucerzan.
In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR-2007), Amsterdam, July 2007.
(Winner of Best Paper Award)
[PDF]
- Learnable similarity functions and their applications in
information integration (e.g., record linkage/identity uncertainty) and text mining
-
RIDDLE: Repository of Information on Duplicate Detection, Record Linkage, and Identity Uncertainty
- Adaptive Blocking: Learning to Scale Up Record Linkage and Clustering
Mikhail Bilenko,
Beena Kamath,
and Raymond J. Mooney.
In Proceedings of the 6th IEEE International Conference on Data Mining
(ICDM-2006), pp.87-96, Hong Kong, December 2006.
[PDF]
- Adaptive Product Normalization: Using Online Learning
for Record Linkage in Comparison Shopping
Mikhail Bilenko, Sugato Basu, and Mehran Sahami. In
Proceedings of the 5th IEEE International Conference on Data Mining (ICDM-2005),
pp.58-65, Houston, TX, November 2005.
[PDF]
- Adaptive Duplicate Detection Using
Learnable String Similarity Measures
Mikhail Bilenko and Raymond J. Mooney. In Proceedings of the 9th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD-2003), pp.39-48, Washington, DC, August 2003.
[PDF]
- On Evaluation and Training-Set Construction
for Duplicate Detection
Mikhail Bilenko and Raymond J. Mooney. In Proceedings of the KDD-2003 Workshop
on Data Cleaning, Record Linkage, and Object Consolidation, pp.7-12, Washington, DC, August 2003.
[PDF]
- Semi-supervised clustering
- Probabilistic Semi-Supervised Clustering with Constraints
Sugato Basu, Mikhail
Bilenko,
Arindam Banerjee,
and Raymond
J. Mooney. In Semi-Supervised
Learning,
O. Chapelle, B. Schölkopf, and A. Zien (eds.), MIT Press, 2006.
Note: this chapter summarizes the KDD and ICML papers below
[PDF]
- A Probabilistic Framework for Semi-Supervised Clustering
Sugato Basu, Mikhail Bilenko,
and Raymond J. Mooney.
In
Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD-2004),
pp.59-68, Seattle, WA, August 2004.
(Winner of Best Research Paper Award)
[PDF]
- Integrating Constraints and Metric Learning in Semi-Supervised Clustering
Mikhail Bilenko, Sugato Basu,
and Raymond J. Mooney.
In Proceedings of the 21st International Conference on Machine Learning (ICML-2004),
pp.81-88, Banff, Canada, July 2004.
[PDF]
- A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov
Random Fields
Mikhail Bilenko and Sugato Basu.
In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and
its Connections to Other Fields (SRL-2004),
pp.17-22, Banff, Canada, July 2004.
[PDF]
- Indirect learning in information integration (record
linkage, information extraction), text classification, and
clustering
- Two Approaches to Handling Noisy Variation in Text Mining
Un Yong Nahm,
Mikhail Bilenko, and Raymond J. Mooney.
In Proceedings of the ICML-2002 Workshop on Text
Learning (TextML'2002), pp.18-27, Sydney, Australia, July 2002.
[PDF]
|