By Rob Knies
June 21, 2010 9:00 AM PT
One challenge many computer-science researchers face is the need to balance the theoretical with the practical. When do they choose time to pursue the scientific ramifications of their work, publish their findings, and seek validation from their peers? And when do they turn their focus to the real-life application of their endeavors and push their discoveries to customers, partners, and customers?
Ravindran Kannan, as much as anyone, is acutely aware of the necessity to juggle the two.
Kannan, principal researcher at Microsoft Research India and head of that facility’s Algorithms Research Group, has been wrestling with that challenge for years. After earning his Ph.D. in operations research at Cornell University, he became interested in computer science, first as a post-doctoral researcher at the University of California, Berkeley, then in the mathematics department at the Massachusetts Institute of Technology.
Following that, he spent 15 years on the computer-science faculty at Carnegie Mellon University, then another 10 at Yale University. He’s had plenty of time to consider the balancing act.
His conclusion? It’s all about 21st-century algorithms. That, he says, means addressing the new demands in today’s computing environment featuring massive data sets.
“There has been a paradigm shift in the field of algorithms,” Kannan says, “because of various things. One is the pressure of growth of data. The amount of data we collect now is much larger, and much more challenging to process, than was the case 20 years ago. That’s one big driving force.
“The algorithms field as a whole used to say, ‘Take all the data, process it, and get me an answer.’ But we can’t quite take all the data and process it for any problem now. Sampling is a big tool that we use.”
On his personal web page, Kannan lists an intellectually impressive collection of research interests: theoretical computer science and optimization, massive data sets and sampling, clustering, Markov chains, and linear algebra algorithms and applications. His book Spectral Algorithms, co-written with Santosh Vempala of the Georgia Institute of Technology and published in September 2009, examines the use of sampling algorithms used in engineering, applied mathematics, and statistics.
While his field may be complex, his reason for moving to Microsoft Research India in 2007 was simple: His homeland beckoned.
“There were several reasons for moving to Microsoft,” says Kannan, a native of the southeastern India state of Tamil Nadu. “I used to visit Redmond quite frequently because I had a very good leading colleague, Láci Lovász [recently named winner of the 2010 Kyoto Prize in Basic Sciences], who was on the staff there. I was familiar with the atmosphere, which is very friendly.
“I’d been in academia all my life, but I also wanted to try to establish a good research atmosphere in India. There are some places which do have a very good research atmosphere, but it’s very small in India. It would be nice to have more research carried out everywhere—in India, in particular, because there’s a lot of talent. Microsoft Research seemed very conducive.”
Having made the transition, he found himself with an initial task: what to do?
“Neither P. Anandan [managing director of Microsoft Research India] nor I had the idea of setting up a group itself,” recalls Kannan, who also serves as adjunct professor in the Department of Computer Science and Automation at the Indian Institute of Science. “But then Anandan listened to a particular lecture by John Hopcroft [an American theoretical computer scientist who won the A.M. Turing Award in 1986] and started thinking of this area as something that Microsoft should have a solid presence in. He asked me if I would head a group, and I said yes, so we started one six months after I got here.
“Our first part of our business, of course, was to build our group. I was the only one. Now we have five of us. We’ve got some of the best talent; they’re all Ph.D.s or researchers from the U.S., as it turns out. That’s not a requirement, but they’re from the best places in the U.S.”
The group’s goal is to place itself at the forefront of conceptual breakthroughs in areas such as streaming algorithms, directed data sampling, machine learning, approximation algorithms, and integration of numerical and combinatorial methods, all expected to play an important role in computer science in coming years. But that doesn’t mean such objectives are etched in granite.
“Our goals have to be flexible,” Kannan emphasizes. “The idea is to carry out very good research, but we wanted to define a focus, and that was this definition of 21st century algorithms. Given the focus, our mission has always been to get the best people in these areas, somewhat broadly defined. You don’t want to impose your particular definition. We have opportunities, as far as the market goes, to get the best talent, so that’s what we did, and I think we’ve been fairly successful. We have a number of publications and advances made by the group here.”
Of course, there are many computer scientists across Microsoft Research and beyond who develop and find algorithms, as Kannan and his team readily acknowledge.
“We have excellent people throughout Microsoft Research dealing with modern algorithmic challenges,” he says. “Microsoft Research Silicon Valley has a number of researchers in algorithms for privacy, search, clustering, and other areas. Microsoft Research Redmond has a range of research in machine learning, ad matching, probabilistic algorithms, and many other areas, and Microsoft Research New England is pursuing algorithmic research in economics and social networks, as well as in combinatorial algorithms.”
The small but accomplished team Kannan has assembled includes:
“His main area of focus has been theory of network routing,” Kannan says. “He’s worried about problems that arise in network routing. One of the problems that arises is in [virtual private networks], where you want to route messages from multiple people to each other. You might have a very complicated network, but if you use complicated routes, you will be in trouble, because then you cannot replicate, and if there is a fault or an error, you have to trace what happened. He proved a nice result settling a conjecture in theory for some time on what kind of simple networks suffice for you to send messages.
“Since coming to our group, he has been working on more networking problems, but he’s also worked on the k-server problem, a distributed system where you have k number of processes trying to service the requests of many people. The question arises of which processor you must use to service the next request. What’s the most economical way of doing it?”
“His main area is what’s called approximation algorithms,” Kannan says. “These are algorithms for problems that are known to be difficult to solve exactly. People look for means of getting approximately correct answers. It’s a well-developed area for problems in clustering or have to do with network routing. That’s another area that comes up with dividing up big structures like graphs—maybe a web graph—into parts that are manageable.
“In these problems, you cannot get the exact best answer. You get approximate answers, and they use many techniques. He specializes in semi-definite programming, a new optimization technique he and others have helped develop.”
“He works in numerical and scientific computing, not the traditional numerical angle, but what we traditionally called an algorithms angle, based on sampling,” Kannan says. “Sampling is something that the numerical people haven’t used too much. Their focus has been to do it deterministically, without any randomness.
“Sampling is a technique you use when the data is too large to be processed as a whole. I had worked on that 10 or 12 years ago, sort of founding this area in the mid-’90s. Amit’s Ph.D. adviser was a former advisee of mine. He worked in this area at MIT, and since coming here, he’s continuing that work.
“Prateek joined us after getting his Ph.D. from the University of Texas at Austin,” Kannan notes. “His `home areas’ are numerical analysis and machine learning. Having been an intern at Microsoft Research Redmond during his student days, he is familiar with the lab. His addition was in keeping with our philosophy that machine learning is an important part of algorithms.
“His work on defining and using matrix norms and his broad general interest in problems relevant to Microsoft made him an excellent choice. Since his arrival, he has been involved not only with other researchers in the Algorithms group, but also in starting collaborations with our Advanced Development and Prototyping group and the Vision, Graphics, and Visualization group led by Anandan, as well being intellectually open to many others.”
Kannan has staffed his team carefully with an eye for what makes an accomplished algorithmic researcher.
“It’s the choice of problems to solve,” he explains. “They should be ideally both theoretically interesting and have wide applicability. You don’t insist precisely on these two criteria—sometimes a researcher excels in one but not the other—but those are important criteria.
“The second thing is ingenuity in trying to solve the problem. Ingenuity may be in the algorithm you devise, but sometimes it comes in taking known algorithms or modifying them ever so slightly so your algorithm is extremely simple. To analyze it sometimes is the hard part, and analysis is important, because there are many, many simple algorithms out there. It can be totally unclear which one would work well.”
One validation technique is to use an empirical approach.
“You try algorithms on this, this, this, and that example and say, ‘Hey, I did so much better than the previous ones.’ That’s a valid method, but it’s not foolproof, because you might have tried it on a set of benchmarks that don’t represent exactly what might come up in the future. In the field of algorithms, we don’t accept that as foolproof.
“We want a theoretical mathematical justification that works well. The ingenuity may be in coming up with that, coming up with the algorithm, or both. Finally, publishing it, getting the peer community to think of it as useful and using it—that is the proof in the pudding.”
Such work sets Kannan’s group aside from other groups working at Microsoft Research India.
“We are a little different,” he says. “We don’t say that we are going to build such-and-such a product or system or alpha test of something in six months or a year. That’s a bit difficult. We have different aims, and we want to do certain things.
“As far as publishing versus using is concerned, we would like to do both. But we start out—and in general, this is true of all of Microsoft Research—with publication or peer acceptance as an important criterion of how good our research is. But we cannot set ourselves the goal of doing this kind of product or this kind of system in a definite period. That’s not our way.”
But while his team might not have specific, time-definite deliverables, it does have a particular set of focus areas.
“Massive data computation is one,” Kannan says. “Sampling on the fly is another, born out of the need to process streaming algorithms, algorithms where the data is so huge it will stream by you once, like video. You don’t have enough storage to put it on your PC or put it on disk and look at it again, because it’s so huge. But you could have intermediate data, which doesn’t have to stream by. You can actually look at it, but it’s very expensive. You have to sample on the fly as it’s going along.”
That takes us back to the concept of 21st-century algorithms and their application to real-world scenarios.
“Algorithms in general have assumed that, massive or not, data was just given to you. You didn’t look to see how it was collected, annotated—all those kinds of things. You didn’t look behind the data. It was given to you, you processed it, and you got an exact answer. That was your job.
“In the real world, that’s not really your job. In the real world, everything from the acquisition of data to annotating it to sampling from it and then applying an algorithm are all necessary parts of how we proceed. Our focus has to be a more holistic approach to data itself, rather than just the algorithmic part. Partly, this involves building models of data. We understand where the data is coming from, and that helps in the algorithm part, as well. It involves stochastic, or probabilistic, models of data. Sampling is a big part of how we handle it.”
In the Internet era, such techniques are critical.
“The other interesting feature of the last decade is that we have many surprises as to which part of an algorithm scales up to real-world problems,” Kannan says. “If you take web crawling, if you take search, if you take all these modern applications, recommendation systems, things that help the consumer, that suggest to the consumer new products, these sort of analyses or consumer behavior—in all of these, one big thing that scaled up to very large collections of data is the numerical algorithm.
“What went under the name of numerical analysis, or scientific computation, got a bit divorced historically from discrete algorithms, and discrete algorithms became the definition of academic algorithms groups, which is too narrow. One thing we want to do is integration, not only looking at data as a whole process, but also looking at all the area of algorithms that people use in practice, that we have theory for, and building on that. The holistic approach in both these dimensions is going to be important.”
By means of example, he refers to the Netflix Challenge.
“Besides numerical algorithms, one of the things that really scales up, that’s being used in practice a lot, is the machine-learning algorithm, which belongs in part to artificial intelligence and in part to theoretical computer science. Again, there has been a bit of divorce from actual machine learning and practice.
“The Netflix Challenge was a big recommendation system. Netflix distributed a list of movies the people had preferences for. The list was partial, and you had to build a system to predict what else these people would like. The community became quite engrossed in this challenge, because it offered a $1 million prize. A lot of algorithms were brought to bear on this. One of our objectives is integrating numerical and scientific computing, and machine learning and other statistical algorithms into our group.”
And, when successful, that goal has the potential to pay real-world dividends.
“We talk to product groups,” Kannan confirms. “We don’t always talk to them. It would be a waste of their time if we went to them with everything that we have. But when there is an opportunity, we would like to interact with them. Microsoft Research has a good system of program managers whose job it is to keep track of what research is going on, including in theory, seeing what the needs are of Microsoft as a company, and connecting with the products.”
In short, Kannan and his team find themselves with the luxury of laboring in the theoretical while keeping an eye on pragmatic opportunities.
“It is motivation that comes from practical problems,” he says. “I don’t build systems to solve particular problems. I don’t take a conditioned system and say how to solve it. What I try to do is put down mathematical principles and algorithms that can be used to solve something.
“At one end of the spectrum is mathematics. At the other end is engineering. In engineering, you are motivated both by intellectual curiosity and by practical problems. In mathematics, you’re motivated by intellectual curiosity. We theoretically do our science research somewhere in between.”
That, it appears, is a nice place to be.
“I find it an intellectually challenging, interesting area,” Kannan smiles. “The bulk of the satisfaction is facing the intellectual challenge and solving it.”