Chris Burges
I'm a Principal Researcher and manager of the Web Learning Group, a group in the Text Mining, Search and Navigation Group at Microsoft Research. I'm interested in machine learning, optimization methods, ranking, and learning for Web applications. Here are brief summaries of some recent work: for details, see my list of publications.
Ranking for Information Retrieval
RankNet is a simple and effective way to learn how to rank, using pair-based neural network training. More recently I've been working on a new way to address a fundamental problem, namely: the cost (or utility) functions that the information retrieval community uses require that the documents returned for a given query be sorted by score first, and such an operation is highly non-differentiable (in fact such cost functions are flat, or discontinuous, everywhere). LambdaRank is a simple and effective method that directly addresses this. The method works well, has an interesting mechanical interpretation, and can also be used to give a significant speedup for RankNet training (the RankNet cost is a special case). An alternative method to learning a weighted sort is given in our Telescoping paper.
Review and Tutorial Articles
Here's the tech report version of a review on dimensional reduction and feature extraction, which appeared in 'Data Mining and Knowledge Discovery: A Complete Guide for Researchers and Practitioners', Eds. O. Maimon and L. Rokach, Kluwer Academic Publishers. The idea was to give a self-contained, tutorial overview of basic methods, such as Principal Component Analysis, Multidimensional Scaling, and the Nyström method, and also of more recent methods which build on them, such as kernel PCA, probabilistic PCA, , Locally Linear Embedding, Laplacian eigenmaps, Landmark MDS, Isomap, and spectral clustering. .
Lagrange multipliers provide a nice way to tie together disparate topics from machine learning. This tech report contains such material that is often encountered but rarely explained, and it's a subset of some lectures I gave at the Machine Learning Summer School in 2003.
Some Mathematical Tools for Machine Learning": Lectures given at the 2003 Summer School on Machine Learning at Max Planck Institute, Tübingen
These are four lectures on some fundamental mathematics underlying many approaches and algorithms in machine learning. They are not about particular learning algorithms; they are about the basic concepts and tools upon which such algorithms are built. Often students feel intimidated by such material: there is a vast amount of "classical mathematics", and it can be hard to find the wood for the trees. The main topics of these lectures are Lagrange multipliers, functional analysis, some notes on matrix analysis, and convex optimization. I've concentrated on things that are often not dwelt on in typical CS coursework. Lots of examples are given; if it's green, it's a puzzle for the student to think about. These lectures are far from complete: perhaps the most significant omissions are probability theory, statistics for learning, information theory, and graph theory. I hope eventually to turn all this into a series of short tutorials. Please let me know of any errors, etc.
Audio Fingerprinting
You have an incoming stream of audio and you'd like to know what's playing. Our RARE (Robust Audio Recognition Engine) system can identify any one of about a quarter million songs in real time using about 10% CPU on an 833 MHz PC. On 36 hours of noisy test audio, it achieves 0.2% false positives at 4.10-6 false negative rate. Confirmation fingerprints can be used to significantly further improve these error rates, with almost no extra CPU cost. Audio fingerprinting has lots of applications: for example, to automatically construct audio thumbnails, and to automatically find duplicate audio clips on your PC. Our main innovations are a method to train for robustness to distortions, and a very fast lookup method. See here for details. Joint work with J. Platt, J. Goldstein, E. Renshaw, C. Herley.
Uniqueness Theorems for Kernel Methods
When does your favorite kernel method have a unique solution, and when does it not? Here we give some general uniqueness theorems for kernel methods, and we examine SVM and v-SVM classification and regression algorithms in particular. Examining conditions for non-uniqueness can give insights into the algorithms themselves; for example, ν-SVM classifiers can be unstable with respect to small changes in ν, and this happens at the transition between unique and non-unique solutions. Joint work with D. Crisp
Image Compression with Neural Networks
Can wavelet image compression be improved by using a neural network to predict wavelet coefficients? We've shown that by reducing the variance of the residual coefficients, the length of the compressed bitstream can indeed be reduced. A two layer fully connected network, trained offline, applied to a test set consisting of seven 512 by 768 images from the Kodak database, gives a consequent overall improvement in the bit rate of between 4% and 7%. Joint work with P. Simard, H. Malvar.
Factoring as Optimization
Factoring (large) biprimes is known to be a very hard problem. However this property can be inverted to give an interesting test-bed for new optimization algorithms, where the difficulty of the problem is easily controlled, and where a large number of problems exists for which the solution is known. This test bed led to a new optimization algorithm, Curvature Inversion Descent (CID). On a small test problem (generated by the problem of factoring 91), random hill descent finds the global minimum 31% of the time; with CID, this increases to 88%.
Constructing Automatic Playlists
How can you use machine learning methods for training on very small training sets (i.e. on one, or just a few, points), but when a significant amount of prior information is available? Gaussian Process regression provides a suitable framework, provided one can use the prior information to choose a suitable kernel. We've applied ideas along these lines to generate music playlists automatically. See here for details. Joint work with J. Platt, S. Swenson, C. Weare, A. Zheng
Personal Interests
The hiking in western Washington is great! Here are some recent views from the Central Cascades: Rachel Lake, Mailbox Peak, Rampart Lakes and Rainbow Lake.
Last Updated May 10, 2007