Making software that automatically learns from data
Machine Learning and Applied Statistics
The Machine Learning and Applied Statistics (MLAS) group is focused on building methods and tools to learn from data. By building software that automatically learns from data, we enable applications to behave more intelligently and enable users to become more productive.
We strive to advance the state of the art in machine learning and statistics, develop fast scalable algorithms for learning and mining, implement portions our work in toolkits, and apply our work to numerous applications.
Current Application Areas
Online advertising/eCommerce: We have applied various learning algorithms and auction design principles to problems in the online advertising and eCommerce arenas including matching algorithms, keyword extraction, keyword similarity, advertisiment relevance, advertiser quality, and click-through prediction.
Selected Publications:
Ece Kumar, Eric Horvitz, Chris Meek, Mobile Opportunistic Commerce: Mechanisms, Architecture, and Application, in AAMAS, 2008
Wen-tau Yih, Christopher Meek, Consistent Phrase Relevance Measures , in Proceedings of The 2nd Annual International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD-08 Workshop), 2008
W. Yih, J. Goodman, V. Carvalho, Finding Advertising Keywords on Web Pages, in Proceedings of the 15th World Wide Web Conference, 2006
Asela Gunawardana, Christopher Meek, Jody Biggs, A Quality-Based Auction for Search Ad Markets with Aggregators, in ACM EC Workshop on Ad Auctions, Association for Computing Machinery, Inc., Jun. 2008
Asela Gunawardana, Christopher Meek, Aggregators and Contextual Effects in Search Ad Markets, in WWW Workshop on Targeting and Ranking for Online Advertising, Association for Computing Machinery, Inc., Apr. 2008
Hila Becker, Christopher Meek, David Maxwell Chickering, Modeling Contextual Factors of Click Rates, in AAAI, pp. 1310-1315, 2007
Recommendation Systems/Collaborative filtering: We have developed tools for predicting a users preferences (e.g., products, movies, TV shows, or books he or she likes) given information about the user such as other preferences they have and demographics. These techniques have shipped in Commerce Server, SQL Server, adCenter and have been used by several online services.
Selected Publications:
Asela Gunawardana, Christopher Meek, Tied Boltzmann Machines for Cold Start Recommendations, in ACM International Conference on Recommender Systems, Association for Computing Machinery, Inc., Oct. 2008
Guy Shani, David Maxwell Chickering, Christopher Meek, Mining Recommendations from the Web, in ACM International Conference on Recommender Systems, 2008
D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization.Journal of Machine Learning Research. 1:49-75, 2000.
Input and Interaction: We continue to work on methods for improving the ability of users to control and utilize computer systems with voice, handwriting, typing, dialog etc...
Selected Publications:
Tim Paek, Yun-Cheng Ju, Accommodating Explicit User Expressions of Uncertainty in Voice Search or Something Like That, International Speech Communication Association, 2008
T. Paek & R. Pieraccini. Automating spoken dialogue management design using machine learning: An industry perspective. Speech Communication, Special Issue on Evaluating New Methods and Models for Advanced Speech-Based Interactive Systems , 2008, 50(8-9): 716-729.
T. Paek & D. Chickering. Improving command and control speech recognition on mobile devices: Using predictive user models for language modeling. User Modeling and User-Adapted Interaction, Special Issue on Statistical and Probabilistic Methods for User Modeling, 2007, 17(1-2): 93-117.
Natural language processing: We are working on machine-learning methods for processing and natural language, including methods for text classification, text clustering, question answering and grammar checking. Applications of text classification include junk-mail detection, auto-classification of email into folders, and auto-classification of urls into favorites. These techniques have shipped in various products including email clients, Windows Live Hotmail andSharePoint Portal Server.
Selected Publications:
Ming-Wei Chang, Wen-tau Yih, Christopher Meek, Partitioned Logistic Regression for Spam Filtering , in Proceedings of The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008
Ming-Wei Chang, Wen-tau Yih, Robert McCann, Personalized Spam Filtering for Gray Mail , in Proceedings of The 5th Conference on Email and Anti-Spam, 2008
W. Yih, R. McCann, A. Kolcz, Improving Spam Filtering by Detecting Gray Mail, in Proceedings of the 4th Conference on Email and Anti-Spam, 2007
Daniel Lowd, Christopher Meek, Good Word Attacks on Statistical Spam Filters, Conference on email and anti-spam (CEAS), 2005
Data mining and analysis: We have developed methods for mining and visualizing patterns from large data sets including (1) algorithms for prediction and segmentation based on Bayesian statistics, (2) methods that semi-automate the process of data analysis/mining, (3) algorithms that dramatically decrease the time to mine large data sets, and (4) new tools for data visualization and exploratory data analysis. These techniques have shipped in SQL Server and Commerce Server as well as the WinMine Toolkit.
Selected Publications:
I. Cadez et al., Model-Based Clustering and Visualization of Navigation Patterns on a Web Site, Data Mining and Knowledge Discovery 2003
Michail Vlachos, Philip S. Yu, Vittorio Castelli, Christopher Meek, Structural Periodic Measures for Time-Series Data, Data Mining Knowledge Discovery, vol. 12, no. 1, pp. 1-28, 2006
D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization.Journal of Machine Learning Research. 1:49-75, 2000.
Computational biology: We are developing and applying machine learning technology to biology data. Recently we have worked on approximate inference techniques that are applicable to phylogenetic trees, Linkage analysis and other related problems.
Selected Publications:
Dan Geiger, Christopher Meek, Ydo Wexler, Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space, in Bioinformatics 2009 25: i196-i203
Ydo Wexler, Christopher Meek, MAS: a multiplicative approximation scheme for probabilistic inference, in NIPS, 2008
Dan Geiger, Christopher Meek, Ydo Wexler, A Variational Inference Procedure Allowing Internal Structure for Overlapping Clusters and Deterministic Constraints, Journal of Artificial Intelligence Research, vol. 27, pp. 1-23, 2006
Jojic et al., Using epitomes to model genetic diversity: Rational design of HIV vaccines, NIPS 2005.
Jojic et al., Efficient approximations for learning phylogenetic HMM models from data, Bioinformatics 2004.
- Asela Gunawardana and Christopher Meek, A Unified Approach to Building Hybrid Recommmender Systems, in ACM International Conference on Recommender Systems, Association for Computing Machinery, Inc., October 2009
- yun-Cheng Ju and Tim Paek, A Voice Search Approach to Replying to SMS Messages in Automobiles, International Speech Communication Association, September 2009
- Dan Geiger, Christopher Meek, and Ydo Wexler, Speeding up HMM algorithms for genetic linkage analysis, in Bioinformatics, Oxford University Press, 2009
- Asela Gunawardana and Christopher Meek, Tied Boltzmann Machines for Cold Start Recommendations, in ACM International Conference on Recommender Systems, Association for Computing Machinery, Inc., October 2008
- Asela Gunawardana, Christopher Meek, and Jody Biggs, A Quality-Based Auction for Search Ad Markets with Aggregators, in ACM EC Workshop on Ad Auctions, Association for Computing Machinery, Inc., June 2008
- Asela Gunawardana and Christopher Meek, Aggregators and Contextual Effects in Search Ad Markets, in WWW Workshop on Targeting and Ranking for Online Advertising, Association for Computing Machinery, Inc., April 2008
- Guy Shani, David Maxwell Chickering, and Christopher Meek, Mining Recommendations from the Web, in ACM International Conference on Recommender Systems, 2008
- Ydo Wexler and Christopher Meek, MAS: a multiplicative approximation scheme for probabilistic inference, in NIPS, 2008
- Chang, Yih, and Christopher Meek, Partitioned Logistic Regression for Spam Filtering, in KDD, 2008
- Ece Kumar, Eric Horvitz, and Chris Meek, Mobile Opportunistic Commerce: Mechanisms, Architecture, and Application, in AAMAS, 2008
- Tim Paek and Yun-Cheng Ju, Accommodating Explicit User Expressions of Uncertainty in Voice Search or Something Like That, International Speech Communication Association, 2008
- Ydo Wexler and Christopher Meek, Inference for Multiplicative Models, in Proceedings of Uncertainty in Artificial Intelligence, 2008
- Ajit P. Singh, Asela Gunawardana, Chris Meek, and Arun C. Sudendran, Recommendations Using Absorbing Random Walks, in North East Student Colloquium on Artificial Intelligence, 2007
- David Heckerman, Christopher Meek, and Daphne Koller, Probabilistic Entity-Relationship Models, PRMs and Plate Models, in Introduction to Statistical Relational Learning, pp. 201-239, MIT Press, 2007
- Hila Becker, Christopher Meek, and David Maxwell Chickering, Modeling Contextual Factors of Click Rates, in AAAI, American Association for Artificial Intelligence , 2007
- Tim Paek, Yun-Cheng Ju, and Christopher Meek, People Watcher: A Game for Eliciting Human-Transcribed Data for Automated Directory Assistance, International Speech Communication Association, 2007
- Donald Metzler, Susan T. Dumais, and Christopher Meek, Similarity Measures for Short Segments of Text, in European Conference on Information Retrieval, 2007
- Milind Mahajan, Asela Gunawardana, and Alex Acero, Training algorithms for hidden conditional random fields, in International Conference on Acoustics, Speech, and Signal Processing, Institute of Electrical and Electronics Engineers, Inc., May 2006
- Dan Geiger, Christopher Meek, and Bernd Sturmfels, On the Toric Algebra of Graphical Models, in The Annals of Statistics, vol. 34, no. 3, pp. 1463-1492, 2006
- Michail Vlachos, Philip S. Yu, Vittorio Castelli, and Christopher Meek, Structural Periodic Measures for Time-Series Data, in Data Mining Knowledge Discovery, vol. 12, no. 1, pp. 1-28, 2006
- Asela Gunawardana and William Byrne, Convergence theorems for generalized alternating minimization procedures, in Journal of Machine Learning Research, MIT Press, December 2005
- Asela Gunawardana, Milind Mahajan, Alex Acero, and John C. Platt, Hidden Conditional Random Fields for Phone Classification, in International Conference on Speech Communication and Technology, International Speech Communication Association, September 2005
- J. Brutlag and C. Meek, Challenges of the email domain for text classification, in Proceedings of the Seventeenth International Conference on Machine Learning, 2000
- Bo Thiesson, Christopher Meek, David Maxwell Chickering, and David Heckerman, Computationally efficient methods for selecting among mixtures of graphical models, with discussion, in Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, pp. 631-656, Oxford University Press, May 1999
- Bo Thiesson, Christopher Meek, David Maxwell Chickering, and David Heckerman, Learning mixtures of DAG models, in Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, August 1998



