April 22, 2013 9:00 AM PT
From April 22 to 24, Microsoft Research will host the Machine Learning Summit 2013 at Microsoft’s Le Campus in Issy-les-Moulineaux, just outside of Paris. The event gathers thought leaders and researchers who will discuss key challenges posed by a new era of machine learning and identify promising techniques with the potential to transform society.
Keynote speakers for the event include Andrew Blake, Microsoft distinguished scientist and laboratory director of Microsoft Research Cambridge, and Judea Pearl, a computer scientist in the Cognitive Systems Lab within the UCLA Computer Science Department, whose artificial-intelligence explorations led to his selection as the winner of the 2011 A.M. Turing Award.
In the days before the event, Christopher Bishop, Microsoft distinguished scientist and head of the Machine Learning and Perception group at Microsoft Research Cambridge, found a few minutes to discuss the current focus on machine learning. Bishop—co-chair for the Machine Learning Summit, along with Evelyne Viegas of Microsoft Research Connections—explains how he became intrigued by the field and what he expects from the Machine Learning Summit.
Q: Why is Microsoft Research holding a Machine Learning Summit?
Bishop: Machine learning has emerged as one of the most important areas of computer science, with the potential to transform the nature of computing itself, as well as to deliver substantial commercial and societal benefit. However, to realize the full potential of machine learning, there are many obstacles to be overcome. The best ideas often come from the intersection of different fields, and yet most of the time, researchers attend conferences that are focused on specific research domains. The Machine Learning Summit takes a much broader perspective and brings together international thought leaders not only from the field of machine learning itself, but also from the many domains that can exploit machine learning, as well as from adjacent disciplines.
Our goals are to highlight the key challenges that lie ahead and to identify the best approaches to solving them. My hope is that, by influencing research directions over the coming years, the Summit will serve to accelerate the adoption and impact of machine learning.
Q: Why does Microsoft Research place such importance on machine learning and on collaborations with academia to advance the state of the art?
Bishop: Machine learning, as a field, has been around for half a century, but it’s really the last few years that it’s really beginning to take off. It’s important to the company. It’s also an extremely active research field at the moment, one that’s grown quite a lot over the last few years.
In terms of collaborations with academia, it’s a broad, rich, deep field, with major challenges that may keep us going for the next hundred years or more. Ultimately, the goal is to produce intelligent machines. Who knows if and when we’ll get there? But because of the scale and complexity and diversity of the field, it’s not something that even Microsoft Research, with its enormous worldwide strength, can do on its own. The collaborations with academia, important across Microsoft Research, are particularly vital in the machine-learning field.
Q: Why has machine learning become such a hot topic over the last couple of years? Is it solely big data that is driving this, or is there something more at work?
Bishop: Big data is an important part of the machine-learning space, but it’s by no means the only driver.
We’ve seen tremendous changes in user interfaces just in the last few years, with speech becoming more widespread, with multitouch interfaces, with gesture recognition, with technologies like Kinect. These natural user interfaces are almost entirely enabled by machine learning.
Another driver for machine learning is how we can take computers to the next level of sophistication. Computers are our servants. They’re here to help us, and, in some ways, they’re extraordinary. Their ability to add up columns of numbers or store and retrieve data is phenomenal.
But we also want to build more subtle kinds of intelligence into machines. What we’ve understood as a research community over the last few decades is that this can’t be done by programming the computers with handcrafted rules. You can’t write an algorithm that causes a machine to do intelligent things. It’s too brittle. It’s too fragile. The only way that works is for the machine to learn the required behaviors from experience—from data, in other words. We program the computer to learn, and then, when we expose it to data, it, hopefully, acquires the required intelligent behavior.
Big data is not the whole of machine learning, but it is an important trend. In a sense, we’re seeing a second Moore’s Law. The computing revolution of the last four decades has fundamentally been driven by Moore’s Law: The number of transistors you can pack onto a chip has been doubling every 18 months to two years.
Now, we’re seeing a Moore’s Law of data. The amount of data seems to be doubling every couple of years. That data offers enormous commercial value, as well as enormous societal value. But there are orders of magnitude more value to be extracted from the data by analyzing it, correlating it, mining it, and gleaning sophisticated statistics from it. That’s the domain of machine learning.
Q: Please provide a brief summary of your background in this field.
Bishop: One of the inspirations for me was that wonderful film and book, 2001: A Space Odyssey, which featured the computer HAL, which seemed to be as intelligent as humans in every respect and could even do things such as play chess a lot better than its human companions.
The idea of building an intelligent machine was fascinating, but the field of artificial intelligence at the time held little appeal for me, because it was, again, based on this idea of rules. Instead, I pursued a career in physics and got a Ph.D. in quantum field theory at the University of Edinburgh. After that, I did research in theoretical plasma physics for controlled fusion.
In the mid-1980s, there was a revolution in the field of artificial intelligence. This is when neural networks came along—and the idea of having computers learn to be intelligent rather than being programmed to be intelligent. That caught my imagination. Over a couple of years, I made a transition from being a physicist to being a computer scientist, and I began working in machine learning.
That’s how I got into the field, a good 25 years ago, and I haven’t looked back. This is a great time to be in machine learning.
Q: What are some of your favorite examples of machine-learning work?
Bishop: One of the things I love, because it has a tremendous elegance but at the same time has also had enormous practical impact and has demonstrated the power of technology, is the Kinect sensor. That’s a beautiful example of something where, if you took the data from the actual Kinect sensor itself, the depth-video data, and tried to write rules about how you would figure out where the human body is, the body pose from that depth information, you would never succeed. There is too much variation to be able to capture it all in rules.
The folks here at Microsoft Research Cambridge took a million examples of body positions. For each example, you have the depth image, together with the labels showing where the various body parts are. The researchers trained the system to take that depth input and produce those body-part labels as the output.
Obviously, this is a hard problem and took a lot of clever people many months to solve. But the result is something that works extremely quickly and very robustly. It’s a nice example of a tough problem but an elegant solution that just works extremely well in an engineering sense. It’s practical and computationally low-cost but robust.
Q: What projects currently under way have you excited?
Bishop: The thing I would single out is ambitious and much longer-term—basic research in a sense—around what has become known as probabilistic programming.
In conventional computer programming, you have variables. They might represent real numbers, they might represent integers, or they might just be binary numbers, but they’re all deterministic. A binary number is either zero or one, or perhaps it’s undefined, but those are the only possibilities, whereas the tasks that we want our computers to solve are full of uncertainty. In the real world, which humans inhabit, there’s uncertainty everywhere. We therefore need to have something that is not just zero or one but can be, let’s say, zero with 30 percent probability and can be one with 70 percent probability. We call that a random variable.
We can code that using conventional programming languages such as C#. But there would be enormous advantage to be gained if we could bake that into the language itself so that random variables became first-class citizens of the programming language. That would allow us to build what we call “models,” machine-learning descriptions of a particular domain, and the computer could do all the hard work in applying those models for us automatically.
That’s the dream. We’re a long way from really solving that, but we’ve made a lot of progress. We have a project at the Microsoft Research Cambridge lab called Infer.NET, a probabilistic programming language that already is quite powerful, has a lot of functionality, and is giving us a glimpse of a very different way of building machine-learning applications.
Q: How can machine learning be used to enhance health care?
Bishop: The potential here is hugely exciting. It goes back to the idea of epidemiology, the idea that by looking at statistics of data sets, particularly data sets relating to large numbers of people, you can find patterns that might indicate the causes of disease, or the effectiveness of drugs, or the side effects of drugs, and so on.
In the world of healthcare, we’re moving into an electronic era where health records are stored electronically, including, potentially, genetic information, environmental information, and lifestyle factors. The potential to perform epidemiology on an unprecedented scale could have as profound an effect on health care as antibiotics or X-rays.
There are lots of hurdles to overcome. You have to have the infrastructure in place to store everybody’s health records in an electronic form, and we’re a long way from doing that. We have to make sure we address privacy issues. And we have to tackle the hard scientific problems of analyzing this data and distinguishing between cause and effect, compared with simply looking for correlations.
But if you think about the things we’ve learned already about the effects of diet and lifestyle choices on diseases and you imagine being able to do that on a much larger scale with much larger and richer data sets, then potential of this approach becomes very exciting.
Q: How can machine learning foster cross-fertilization between computer-science research areas?
Bishop: Let’s go back to the foundations of the Cambridge lab. I joined quite early on as a machine-learning researcher. We quickly hired researchers in computer vision—Andrew Blake and others—and, right from the outset, we wanted the computer-vision people and the machine-learning people to be close together and working collaboratively, because we understood that the future of computer vision lay with machine learning and that one of the biggest application areas for machine learning would be computer vision. That’s certainly been borne out. That’s an adjacent field, so those sorts of collaborations are not too surprising.
What is interesting is that, for the last five years, we’ve seen collaborations between people in machine learning and pretty much every other area of the lab’s research. The systems people are being challenged by machine learning, because machine-learning research deals with huge data sets running in big data centers, and they want to process things in real time for applications such as web search.
We’re seeing collaborations with computational science, for building models of the global carbon cycle. We’ve already talked about the use of machine learning in human-computer interaction through the creation of natural user interfaces. But I would have said until recently that the area of the lab most distant from machine learning was the programming-languages area. Yet these two disparate fields have come together in the last few years through this challenge of probabilistic programming. How do we deal with uncertainty in the context of a programming language?
You have people like Andy Gordon, who in the past has worked on important concepts in mainstream computer languages, getting interested in this field and collaborating on a daily basis with people in machine learning. So machine learning has been a great catalyst for building new links across all of the lab’s research areas.
Q: What do you hope to achieve with your machine-learning investigations?
Bishop: The key word there would be personalization. It’s almost ironic, it’s almost a paradox, that to achieve personalization, you need to look at large numbers of people. You need to look at the crowd.
Imagine that I want to build something that can recognize your particular handwriting. I could get you to spend years providing me with lots of examples of your handwriting and then run machine learning on that. But that would be tedious; that’s not a good solution. If I collect the data from millions of people, I can analyze it and perhaps discover that there is a variety of different handwriting styles, and I can effectively learn recognizers tuned to each of them. Then I only need a small sample of your writing to be able to detect which is your writing style and provide you with a recognizer tuned to your particular style.
One of the great dreams—and I think it will come to fruition in the next decade or so—has a lot to do with personalized computing. I think we’ll see computers becoming much more personalized, in all kinds of ways. They will become much more tuned to our needs and understand us better as individuals.