The Machine Learning and Intelligence Group at Microsoft Research, Redmond
I'm a Principal Researcher in, and manager of, the Machine Learning and Intelligence(MLI) group in the Machine Learning Group at Microsoft Research. Our work covers a broad spectrum of research activities, from fundamental, theoretical machine learning to a wide variety of applied research projects, such as social network-based question answering and url selection for Bing's index. Our machine learning technologies are used throughout Microsoft's Online Services division (in particular, Bing, AdCenter and MSN): for example our ranking technology is used for core Web Search, for Web Search verticals like image and commerce search, and for Ads relevance. This is just one example: I invite you to browse the team's web pages to find out more.
My Own Research
I'm interested in machine learning, optimization methods, the intersection of machine learning with natural language processing, semantic modeling, and information retrieval. I'm currently particularly interested in machine reading and semantic modeling, with models that leverage large amounts of unlabeled text data. Geoff Zweig and I recently constructed a dataset that we hope will prove useful for testing text-based semantic modeling methods - you can find the details here. Here are a few other things I've been working on recently - for more information please visit my publications page.
Ranking for Information Retrieval
Information retrieval measures such as Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP), and Mean Reciprocal Rank (MRR), are difficult to optimize directly, since viewed as functions of the model parameters, they are either flat or discontinuous everywhere. LambdaRank was an attempt to solve this, and in fact, in this [paper, tech report], we verified that LambdaRank indeed directly optimizes NDCG, and we further demonstrated that LambdaRank can easily be adapted to directly optimize MAP and MRR. This work was done using neural nets as the underlying model. In this paper, we showed that boosted tree classifiers can make excellent rankers; this, together with the fact that on large, artificial data sets, boosted trees can do arbitrarily well, whereas neural nets cannot, suggests trying the LambdaRank idea (which applies to any model for which a gradient of the cost can be defined, not only neural nets) with boosted trees. The resulting algorithm is called LambdaMART [paper, tech report]. Our first ranking algorithm for Search, RankNet, is still an excellent method for training using pairwise preferences (for example, from clicks), and can also easily be adapted to work with boosted trees. For an overview of these algorithms see this [tech report].
We recently won the Yahoo! Learning to Rank Challenge (Track 1). We used an ensemble of 12 models, 8 of which were LambdaMART (plus 2 LambdaRank neural nets and 2 MART regression models). We were the only team we know of who directly optimized for the measure used in the competition (Expected Reciprocal Rank), demonstrating the flexibility of the LambdaMART approach. In Track 1, which was the standard Web search ranking problem, 312 teams submitted at least two models (and over a thousand submitted at least one). See http://learningtorankchallenge.yahoo.com/ for more details (our team name was Ca3Si2O7, the chemical formula for Rankinite).
Review Articles and Talks
While MSR does not have research-specific courses per se, there are still many opportunities to teach and learn collaboratively. In MSR Redmond's Machine Learning Group (in which MLI is a subgroup) we have a Learning Theory Book Club, a Machine Learning Seminar, bi-weekly Brainstorming Tea Times, and bi-weekly group seminars. (By attending these, together with MSR-Redmond's very rich seminar series, it's quite possible to get no real work done at all.)
Here's a tutorial review article on Dimension Reduction. It covers many well-known, and some less well-known, methods for dimension reduction for which the inferred variables are continuous. Here's a lecture I gave recently at the University of Washington - part 1 of 2 - on the mathematical foundations for machine learning. It's an updated version of lectures I gave at the machine learning summer school in the Max Planck institute in Tuebingen in 2003; here's a condensed version of those lectures. Finally, here's an older review article on support vector machines.
You have an incoming stream of audio and you'd like to know what's playing. Our RARE (Robust Audio Recognition Engine) system can identify any one of about a quarter million songs in real time using about 10% CPU on an 833 MHz PC. On 36 hours of noisy test audio, it achieves 0.2% false positives at 4.10-6 false negative rate. Confirmation fingerprints can be used to significantly further improve these error rates, with almost no extra CPU cost. Our work is currently used in Windows Media Player and in the Zune media player. Audio fingerprinting has lots of applications: for example, to automatically construct audio thumbnails, and to automatically find duplicate audio clips on your PC. Our main innovations are a method to train for robustness to distortions, and a lookup method that is over an order of magnitude faster than competing methods for this problem. See here for details. Joint work with J. Platt, J. Goldstein, E. Renshaw, C. Herley.
The hiking in western Washington is great! Here are some views from the Central Cascades: Rachel Lake, Mailbox Peak, Rampart Lakes and Rainbow Lake. I also like to play around with composing and playing music, to run in the woods with the dogs (usually one dog at a time), and to very occasionally shoot inanimate flying objects with a shotgun.