Machine Learning and Applied Statistics
The Machine Learning and Applied Statistics (MLAS) group is focused on learning from data and data mining. By building software that automatically learns from data, we enable applications that (1) do intelligent tasks such as handwriting recognition and natural-language processing, and (2) help human data analysts more easily explore and better understand their data. We strive to advance the state of the art in machine learning and statistics, develop fast scalable algorithms for learning and mining, implement portions our work toolkits, and apply our work to numerous product applications.
ProjectsComputational biology: We are developing and applying machine learning technology to biology data. Projects include - Rational design of HIV vaccine: This work includes learning graphical models that summarize interactions between the human immune system and the viral mutation.
- Phylogenetic trees: We have developed approximation methods for learning phylogenetic trees in which adjacent mutations are not assumed to be independent.
Data mining and analysis: We have developed methods for mining and visualizing patterns from large data sets including (1) algorithms for prediction and segmentation based on Bayesian statistics, (2) methods that semi-automate the process of data analysis/mining, (3) algorithms that dramatically decrease the time to mine large data sets, and (4) new tools for data visualization and exploratory data analysis. These techniques have shipped in SQL Server 2000 and Commerce Server 2000 as well as the WinMine Toolkit. Junk-mail filtering: An adaptive filter has shipped in MSN8. Targeted advertising: We have developed tools that use information about the user (demographics, browsing history, etc.) to maximize response rate (e.g. click-through) to advertising, while still working within the constraints of how advertising is sold (i.e. quotas). Collaborative filtering/cross sell: We have developed tools for predicting a users preferences (e.g., products, movies, TV shows, or books he or she likes) given information about the user such as other preferences they have and demographics. These techniques have shipped in Commerce Server 2000. Text Classification and Text Clustering for Knowledge Management: We have created technology that helps site administrators build and maintain category hierarchies for documents. The text-classification component of the system automatically assigns or suggests category labels to new (unlabeled) documents, based on the word content of the documents. The text-clustering component suggests a hierarchically organized set of categories when no such structure exists. Applications of text classification include junk-mail detection, auto-classification of email into folders, and auto-classification of urls into favorites. These techniques have shipped in SharePoint Portal Server. Natural language processing: We are working on machine-learning methods for processing and natural language, including methods for question answering and grammar checking. Speech recognition: We are working on methods for removing noise and reverberation from speech signals with the aim of improving speech recognition. Handwriting recognition: We are working to improve both online and offline recognition. Many of our projects and contributions to products are collaborative efforts with the ASI, DMX, DPU, NLP, and SP groups. People Primary Contact: Chris Meek
| 
Alexei | Photo Not Available Max | 
Asela | Photo Not Available Nebojsa | 
Darko | 
Aleksander | 
Jesper | 
Chris | 
Tim | Photo Not Available Guy | 
Bo | 
Scott | | | |
| 
Dmitriy | 
Christopher | 
Eric | 
Chris J.C. | Photo Not Available Jonathan | 
Silviu-Petru | 
David | 
Tomer | Photo Not Available Eric | Photo Not Available Geoff | Photo Not Available Nebojsa | 
Carl | 
Jennifer | 
Mukund | 
John | Photo Not Available Steve | 
Carsten | 
Robert | 
Lorenzo | 
Rong | 
Dengyong | 
Zou, Xinli | | | |
Publications- Asela Gunawardana, Christopher Meek. Aggregators and Contextual Effects in Search Ad Markets April 2008
- Kenneth Church, Bo Thiesson. The Wild Thing Goes Local July 2007
- Kenneth Church, Bo Thiesson, Robert Ragno. K-Best Suffix Arrays April 2007
- Bo Thiesson, Jesper Lind. Mining cross-predicting stochastic ARMA time series in SQL server 2005 May 2006
- Nebojsa Jojic, John Winn, Larry Zitnick. Escaping Local Minima through Hierarchical Model Selection: Automatic Object Discovery, Segmentation, and Tracking in Video 2006 Proceedings of IEEE CVPR
- Kenneth Church, Bo Thiesson. The Wild Thing! May 2005
- Bo Thiesson, Christopher Meek. Efficient gradient computation for conditional Gaussian models January 2005
- John Winn, Nebojsa Jojic. LOCUS: Learning Object Classes with Unsupervised Segmentation 2005 Proc. IEEE Intl. Conf. on Computer Vision (ICCV)
- Bo Thiesson, Christopher Meek. Discriminative Model Selection for Density Models January 2003
- Christopher Meek, Bo Thiesson, David Heckerman. Staged mixture modeling and boosting August 2002
See more publications...
PublicationsD. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization.Journal of Machine Learning Research. 1:49-75, 2000. D. Heckerman. A Tutorial on Learning with Bayesian Networks. In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, 1999. D. Heckerman, C. Meek, and G. Cooper. A Bayesian approach to causal discovery. In C. Glymour and G. Cooper, editors, Computation, Causation, and Discovery, pages 141-165. MIT Press, Cambridge, MA, 1999. B. Thiesson, C. Meek, D. M. Chickering, and D. Heckerman. Computationally Efficient Methods for Selecting Among Mixtures of Graphical Models. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6, pages 631-656, Oxford University Press, Oxford, 1999. D. Chickering and D. Heckerman. Fast learning from sparse data. In Proceedings of Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, Morgan Kaufmann, August 1999. J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, Morgan Kaufmann, July 1998. S. T. Dumais, J. Platt, D. Heckerman and M. Sahami (1998). Inductive learning algorithms and representations for text categorization. (Word file) In Proceedings of ACM-CIKM98, November 1998. D. Heckerman and E. Horvitz. Inferring informational goals from free-text queries: A Bayesian approach. In Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, Morgan Kaufmann, July 1998. D. Chickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, RI, pages 80-89. Morgan Kaufmann, August 1997. D. Chickering and D. Heckerman. Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables.Machine Learning, 29:181-212, 1997. D. Geiger and D. Heckerman. A characterization of the Dirichlet distribution through global and local parameter independence.The Annals of Statistics, 25:1344-1369, 1997. P. Smyth, D. Heckerman, M. Jordan. Probabilistic Independence Networks for Hidden Markov Probability Models.Neural Computation, 9:227-269, 1997. J. Breese and D. Heckerman. Decision-theoretic troubleshooting: A framework for repair and experiment. In J. Shanteau, B. Mellers, and D. Schum, editors, Decision Science and Technology: Reflections on the Contributions of Ward Edwards, pages 271-287. Kluwer Academic Publishers, Boston, MA, 1999. D. Geiger and D. Heckerman. Beyond Bayesian networks: Similarity networks and Bayesian multinets. Artificial Intelligence, 82:45-74, 1996. D. Heckerman and R. Shachter. Decision-Theoretic Foundations for Causal Reasoning.Journal of Artificial Intelligence Research, 3:405-430, 1995. D. Heckerman, D. Geiger, and D. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data.Machine Learning, 20:197-243, 1995. D. Heckerman, A. Mamdani, and M. Wellman, eds. Special Issue on Bayesian Networks: Communications of the ACM, 38, 1995.
MLAS Online and in the NewsMicrosoft Research Tech Transfers: Better Decisions Faster (November 2005) Microsoft Researchers Use Machine Learning Techniques to Help Advance HIV Vaccine Research (February 2005) Smartscreen Technology on microsoft.com (July, 2004) "10 Emerging Technologies That Will Change Your World: Bayesian Machine Learning" (Technology Review, January 2004) "Medical Diagnostic and Treatment Software Holds Potential to Save Lives and Improve Patient Care Worldwide" (Microsoft Press Pass, Januray 2004) "Microsoft Adds New Spam Filtering Technology Across E-Mail Platforms" (Microsoft Press Pass, November 2003) "Consumer Input, Scientific Analysis Provide Foundation for MSN 8 Research and Innovation" (Microsoft Press Pass, October 2002) "Microsoft Research Contributions Keep Microsoft Products on the Cutting Edge" (Microsoft Press Pass, September 2001) "Microsoft Research: Who benefits" (InformationWeek, March 12, 2001, Stuart Johnston) "Microsoft Announces Branding and RC1 Availability of `Tahoe Server" (Microsoft PressPass, January 2001) "Microsoft Research Innovations Reduce Challenges of Analyzing and Managing Massive Databases" (Microsoft PressPass, October 2000) "Artificial Intelligence Gets Real" (ZDnet, August 1997) Last updated: November 2005
| | |