Making software that automatically learns from data
Machine Learning and Applied Statistics
The Machine Learning and Applied Statistics (MLAS) group is now part of the Machine Learning Department. Visit that page for the latest as the page below will not be updated moving forward. Historically the group has focused on building methods and tools to learn from data. By building software that automatically learns from data, we enable applications to behave more intelligently and enable users to become more productive.
We strive to advance the state of the art in machine learning and statistics, develop fast scalable algorithms for learning and mining, implement portions our work in toolkits, and apply our work to numerous applications.
Current Application Areas
Online advertising/eCommerce: We have applied various learning algorithms and auction design principles to problems in the online advertising and eCommerce arenas including matching algorithms, keyword extraction, keyword similarity, advertisiment relevance, advertiser quality, and click-through prediction.
Selected Publications:
Ece Kumar, Eric Horvitz, Chris Meek, Mobile Opportunistic Commerce: Mechanisms, Architecture, and Application, in AAMAS, 2008
Wen-tau Yih, Christopher Meek, Consistent Phrase Relevance Measures , in Proceedings of The 2nd Annual International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD-08 Workshop), 2008
W. Yih, J. Goodman, V. Carvalho, Finding Advertising Keywords on Web Pages, in Proceedings of the 15th World Wide Web Conference, 2006
Asela Gunawardana, Christopher Meek, Jody Biggs, A Quality-Based Auction for Search Ad Markets with Aggregators, in ACM EC Workshop on Ad Auctions, Association for Computing Machinery, Inc., Jun. 2008
Asela Gunawardana, Christopher Meek, Aggregators and Contextual Effects in Search Ad Markets, in WWW Workshop on Targeting and Ranking for Online Advertising, Association for Computing Machinery, Inc., Apr. 2008
Hila Becker, Christopher Meek, David Maxwell Chickering, Modeling Contextual Factors of Click Rates, in AAAI, pp. 1310-1315, 2007
Recommendation Systems/Collaborative filtering: We have developed tools for predicting a users preferences (e.g., products, movies, TV shows, or books he or she likes) given information about the user such as other preferences they have and demographics. These techniques have shipped in Commerce Server, SQL Server, adCenter and have been used by several online services.
Selected Publications:
Asela Gunawardana, Christopher Meek, Tied Boltzmann Machines for Cold Start Recommendations, in ACM International Conference on Recommender Systems, Association for Computing Machinery, Inc., Oct. 2008
Guy Shani, David Maxwell Chickering, Christopher Meek, Mining Recommendations from the Web, in ACM International Conference on Recommender Systems, 2008
D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization.Journal of Machine Learning Research. 1:49-75, 2000.
Input and Interaction: We continue to work on methods for improving the ability of users to control and utilize computer systems with voice, handwriting, typing, dialog etc...
Selected Publications:
Tim Paek, Yun-Cheng Ju, Accommodating Explicit User Expressions of Uncertainty in Voice Search or Something Like That, International Speech Communication Association, 2008
T. Paek & R. Pieraccini. Automating spoken dialogue management design using machine learning: An industry perspective. Speech Communication, Special Issue on Evaluating New Methods and Models for Advanced Speech-Based Interactive Systems , 2008, 50(8-9): 716-729.
T. Paek & D. Chickering. Improving command and control speech recognition on mobile devices: Using predictive user models for language modeling. User Modeling and User-Adapted Interaction, Special Issue on Statistical and Probabilistic Methods for User Modeling, 2007, 17(1-2): 93-117.
Natural language processing: We are working on machine-learning methods for processing and natural language, including methods for text classification, text clustering, question answering and grammar checking. Applications of text classification include junk-mail detection, auto-classification of email into folders, and auto-classification of urls into favorites. These techniques have shipped in various products including email clients, Windows Live Hotmail andSharePoint Portal Server.
Selected Publications:
Ming-Wei Chang, Wen-tau Yih, Christopher Meek, Partitioned Logistic Regression for Spam Filtering , in Proceedings of The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008
Ming-Wei Chang, Wen-tau Yih, Robert McCann, Personalized Spam Filtering for Gray Mail , in Proceedings of The 5th Conference on Email and Anti-Spam, 2008
W. Yih, R. McCann, A. Kolcz, Improving Spam Filtering by Detecting Gray Mail, in Proceedings of the 4th Conference on Email and Anti-Spam, 2007
Daniel Lowd, Christopher Meek, Good Word Attacks on Statistical Spam Filters, Conference on email and anti-spam (CEAS), 2005
Data mining and analysis: We have developed methods for mining and visualizing patterns from large data sets including (1) algorithms for prediction and segmentation based on Bayesian statistics, (2) methods that semi-automate the process of data analysis/mining, (3) algorithms that dramatically decrease the time to mine large data sets, and (4) new tools for data visualization and exploratory data analysis. These techniques have shipped in SQL Server and Commerce Server as well as the WinMine Toolkit.
Selected Publications:
I. Cadez et al., Model-Based Clustering and Visualization of Navigation Patterns on a Web Site, Data Mining and Knowledge Discovery 2003
Michail Vlachos, Philip S. Yu, Vittorio Castelli, Christopher Meek, Structural Periodic Measures for Time-Series Data, Data Mining Knowledge Discovery, vol. 12, no. 1, pp. 1-28, 2006
D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization.Journal of Machine Learning Research. 1:49-75, 2000.
Computational biology: We are developing and applying machine learning technology to biology data. Recently we have worked on approximate inference techniques that are applicable to phylogenetic trees, Linkage analysis and other related problems.
Selected Publications:
Dan Geiger, Christopher Meek, Ydo Wexler, Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space, in Bioinformatics 2009 25: i196-i203
Ydo Wexler, Christopher Meek, MAS: a multiplicative approximation scheme for probabilistic inference, in NIPS, 2008
Dan Geiger, Christopher Meek, Ydo Wexler, A Variational Inference Procedure Allowing Internal Structure for Overlapping Clusters and Deterministic Constraints, Journal of Artificial Intelligence Research, vol. 27, pp. 1-23, 2006
Jojic et al., Using epitomes to model genetic diversity: Rational design of HIV vaccines, NIPS 2005.
Jojic et al., Efficient approximations for learning phylogenetic HMM models from data, Bioinformatics 2004.
- Jim C. Huang, Christopher Meek, Carl Kadie, and David Heckerman, Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies, in PLoS ONE, PLoS, 12 July 2011
- Puyang Xu, Asela Gunawardana, and Sanjeev Khudanpur, Efficient Subsampling for Training Complex Language Models, in Empirical Methods in Natural Language Processing, Association for Computational Linguistics, July 2011
- Guy Shani, Asela Gunawardana, and Christopher Meek, Unsupervised hierarchical probabilistic segmentation of discrete events, in Intelligent Data Analysis, IOS Press, 27 June 2011
- Guy Shani and Asela Gunawardana, Evaluating Recommendation Systems, in Recommender Systems Handbook, Springer, 2011
- Yun-Cheng Ju and Tim Paek, Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study, Association for Computational Linguistics, 11 July 2010
- Tim Paek, Michael Gamon, Scott Counts, David Maxwell Chickering, and Aman Dhesi, Predicting the Importance of Newsfeed Posts and Social Network Friends, American Association for Artificial Intelligence , July 2010
- Yun-Cheng Ju and Tim Paek, How to Safely Respond to SMS Messages in Automobiles, in 2nd Multimodal Interfaces for Automobile Applications (MIAA), Association for Computing Machinery, Inc., 7 February 2010
- Asela Gunawardana, Tim Paek, and Christopher Meek, Usability Guided Key-Target Resizing for Soft Keyboards, in International Conference on Intelligent User Interfaces, Association for Computing Machinery, Inc., February 2010
- Guy Shani, Christopher Meek, and Asela Gunawardana, Hierarchical Probabilistic Segmentation of Discrete Events, in IEEE International Conference on Data Mining, IEEE, 6 December 2009
- Asela Gunawardana and Guy Shani, A Survey of Accuracy Evaluation Metrics of Recommendation Tasks, in Journal of Machine Learning Research, vol. 10, pp. 2935-2962, December 2009
- Asela Gunawardana and Christopher Meek, A Unified Approach to Building Hybrid Recommmender Systems, in ACM International Conference on Recommender Systems, Association for Computing Machinery, Inc., October 2009
- yun-Cheng Ju and Tim Paek, A Voice Search Approach to Replying to SMS Messages in Automobiles, International Speech Communication Association, September 2009
- Dan Geiger, Christopher Meek, and Ydo Wexler, Speeding up HMM algorithms for genetic linkage analysis, in Bioinformatics, Oxford University Press, 2009
- Asela Gunawardana and Christopher Meek, Tied Boltzmann Machines for Cold Start Recommendations, in ACM International Conference on Recommender Systems, Association for Computing Machinery, Inc., October 2008
- Asela Gunawardana, Christopher Meek, and Jody Biggs, A Quality-Based Auction for Search Ad Markets with Aggregators, in ACM EC Workshop on Ad Auctions, Association for Computing Machinery, Inc., June 2008
- Asela Gunawardana and Christopher Meek, Aggregators and Contextual Effects in Search Ad Markets, in WWW Workshop on Targeting and Ranking for Online Advertising, Association for Computing Machinery, Inc., April 2008
- Chang, Yih, and Christopher Meek, Partitioned Logistic Regression for Spam Filtering, in KDD, 2008
- Guy Shani, David Maxwell Chickering, and Christopher Meek, Mining Recommendations from the Web, in ACM International Conference on Recommender Systems, 2008
- Tim Paek and Yun-Cheng Ju, Accommodating Explicit User Expressions of Uncertainty in Voice Search or Something Like That, International Speech Communication Association, 2008
- Ece Kumar, Eric Horvitz, and Chris Meek, Mobile Opportunistic Commerce: Mechanisms, Architecture, and Application, in AAMAS, 2008
- Ydo Wexler and Christopher Meek, Inference for Multiplicative Models, in Proceedings of Uncertainty in Artificial Intelligence, 2008
- Ydo Wexler and Christopher Meek, MAS: a multiplicative approximation scheme for probabilistic inference, in NIPS, 2008
- David Heckerman, Christopher Meek, and Daphne Koller, Probabilistic Entity-Relationship Models, PRMs and Plate Models, in Introduction to Statistical Relational Learning, pp. 201-239, MIT Press, 2007
- Ajit P. Singh, Asela Gunawardana, Chris Meek, and Arun C. Sudendran, Recommendations Using Absorbing Random Walks, in North East Student Colloquium on Artificial Intelligence, 2007
- Donald Metzler, Susan T. Dumais, and Christopher Meek, Similarity Measures for Short Segments of Text, in European Conference on Information Retrieval, 2007
