Machine Learning and Applied Statistics

Making software that automatically learns from data

Machine Learning and Applied Statistics

The Machine Learning and Applied Statistics (MLAS) group is now part of the Machine Learning Department. Visit that page for the latest as the page below will not be updated moving forward. Historically the group has focused on building methods and tools to learn from data. By building software that automatically learns from data, we enable applications to behave more intelligently and enable users to become more productive.

We strive to advance the state of the art in machine learning and statistics, develop fast scalable algorithms for learning and mining, implement portions our work in toolkits, and apply our work to numerous applications.

Current Application Areas

Online advertising/eCommerce: We have applied various learning algorithms and auction design principles to problems in the online advertising and eCommerce arenas including matching algorithms, keyword extraction, keyword similarity, advertisiment relevance, advertiser quality, and click-through prediction.

Selected Publications:

Ece Kumar, Eric Horvitz, Chris Meek, Mobile Opportunistic Commerce: Mechanisms, Architecture, and Application, in AAMAS, 2008

Wen-tau Yih, Christopher Meek, Consistent Phrase Relevance Measures , in Proceedings of The 2nd Annual International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD-08 Workshop), 2008

W. Yih, J. Goodman, V. Carvalho, Finding Advertising Keywords on Web Pages, in Proceedings of the 15th World Wide Web Conference, 2006

Asela Gunawardana, Christopher Meek, Jody Biggs, A Quality-Based Auction for Search Ad Markets with Aggregators, in ACM EC Workshop on Ad Auctions, Association for Computing Machinery, Inc., Jun. 2008

Asela Gunawardana, Christopher Meek, Aggregators and Contextual Effects in Search Ad Markets, in WWW Workshop on Targeting and Ranking for Online Advertising, Association for Computing Machinery, Inc., Apr. 2008

Hila Becker, Christopher Meek, David Maxwell Chickering, Modeling Contextual Factors of Click Rates, in AAAI, pp. 1310-1315, 2007

 

Recommendation Systems/Collaborative filtering: We have developed tools for predicting a users preferences (e.g., products, movies, TV shows, or books he or she likes) given information about the user such as other preferences they have and demographics. These techniques have shipped in Commerce Server, SQL Server, adCenter and have been used by several online services.

Selected Publications:

Asela Gunawardana, Christopher Meek, Tied Boltzmann Machines for Cold Start Recommendations, in ACM International Conference on Recommender Systems, Association for Computing Machinery, Inc., Oct. 2008

Guy Shani, David Maxwell Chickering, Christopher Meek, Mining Recommendations from the Web, in ACM International Conference on Recommender Systems, 2008

D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization.Journal of Machine Learning Research. 1:49-75, 2000.

 

Input and Interaction: We continue to work on methods for improving the ability of users to control and utilize computer systems with voice, handwriting, typing, dialog etc...

Selected Publications:

Tim Paek, Yun-Cheng Ju, Accommodating Explicit User Expressions of Uncertainty in Voice Search or Something Like That, International Speech Communication Association, 2008

T. Paek & R. Pieraccini. Automating spoken dialogue management design using machine learning: An industry perspective. Speech Communication, Special Issue on Evaluating New Methods and Models for Advanced Speech-Based Interactive Systems , 2008, 50(8-9): 716-729.

T. Paek & D. Chickering. Improving command and control speech recognition on mobile devices: Using predictive user models for language modeling. User Modeling and User-Adapted Interaction, Special Issue on Statistical and Probabilistic Methods for User Modeling, 2007, 17(1-2): 93-117.

Natural language processing: We are working on machine-learning methods for processing and natural language, including methods for text classification, text clustering, question answering and grammar checking. Applications of text classification include junk-mail detection, auto-classification of email into folders, and auto-classification of urls into favorites. These techniques have shipped in various products including email clients, Windows Live Hotmail andSharePoint Portal Server.

Selected Publications:

Ming-Wei Chang, Wen-tau Yih, Christopher Meek, Partitioned Logistic Regression for Spam Filtering , in Proceedings of The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008

Ming-Wei Chang, Wen-tau Yih, Robert McCann, Personalized Spam Filtering for Gray Mail , in Proceedings of The 5th Conference on Email and Anti-Spam, 2008

W. Yih, R. McCann, A. Kolcz, Improving Spam Filtering by Detecting Gray Mail, in Proceedings of the 4th Conference on Email and Anti-Spam, 2007

Daniel Lowd, Christopher Meek, Good Word Attacks on Statistical Spam Filters, Conference on email and anti-spam (CEAS), 2005

 

Data mining and analysis: We have developed methods for mining and visualizing patterns from large data sets including (1) algorithms for prediction and segmentation based on Bayesian statistics, (2) methods that semi-automate the process of data analysis/mining, (3) algorithms that dramatically decrease the time to mine large data sets, and (4) new tools for data visualization and exploratory data analysis. These techniques have shipped in SQL Server and Commerce Server as well as the WinMine Toolkit.

Selected Publications:

I. Cadez et al., Model-Based Clustering and Visualization of Navigation Patterns on a Web Site, Data Mining and Knowledge Discovery 2003

Michail Vlachos, Philip S. Yu, Vittorio Castelli, Christopher Meek, Structural Periodic Measures for Time-Series Data, Data Mining Knowledge Discovery, vol. 12, no. 1, pp. 1-28, 2006

D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization.Journal of Machine Learning Research. 1:49-75, 2000.

 

Computational biology: We are developing and applying machine learning technology to biology data. Recently we have worked on approximate inference techniques that are applicable to phylogenetic trees, Linkage analysis and other related problems. 

Selected Publications:

Dan Geiger, Christopher Meek, Ydo Wexler, Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space, in Bioinformatics 2009 25: i196-i203

Ydo Wexler, Christopher Meek, MAS: a multiplicative approximation scheme for probabilistic inference, in NIPS, 2008

Dan Geiger, Christopher Meek, Ydo Wexler, A Variational Inference Procedure Allowing Internal Structure for Overlapping Clusters and Deterministic Constraints, Journal of Artificial Intelligence Research, vol. 27, pp. 1-23, 2006

Jojic et al., Using epitomes to model genetic diversity: Rational design of HIV vaccines, NIPS 2005.

Jojic et al., Efficient approximations for learning phylogenetic HMM models from data, Bioinformatics 2004.

 

 

Publications