By Rob Knies
May 28, 2010 9:00 AM PT
We live in a world bulging at the seams with data. For today’s scientists, that data deluge represents a conundrum. On one hand, they have a wealth of information at their analytic disposal; on the other, that wealth incurs the cost of learning to cope with the sheer volume.
On occasion, that can prove a heavy burden indeed. In Yi Ma’s case, however, this preponderance of data has presented a research opportunity paying enormous dividends.
Ma, research manager of the Visual Computing Group at Microsoft Research Asia, is interested in the mathematical principles behind the processing and understanding of visual data. Image-based object recognition is one of computer vision’s core challenges, with human faces being perhaps the most compelling scenario. Ma and his fellow researchers suggest that a critical piece of under-scrutinized information—a key mathematical concept married to an effective software application—points the way to an era of powerful face-recognition advances.
In recent months, he has devised an algorithmic approach to face recognition that has produced prodigious results. From April to October of 2009—a few months after he joined Microsoft Research Asia during a leave of absence from his role as associate professor in the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign (UIUC)—Ma and his team have demonstrated for the first time that, beyond doubt, computer face recognition can surpass human vision significantly under extremely challenging conditions, such as drastic lighting conditions or partial occlusions.
He credits such success to “the blessing of dimensionality.”
“In computer vision,” Ma states, “you’re routinely dealing with images or videos that are very high-dimensional. Fortunately, it turns out that when the dimensionality is high enough, if you have the right computation tool, you can harness rich redundancy in the data that gives you a very, very good chance of solving some of the hardest problems in the world. That’s why it’s called the blessing of dimensionality.”
In a story entitled Face Recognition Breakthrough, which appeared in the August 2009 edition of Communications of the ACM, Guillermo Sapiro, a professor in the Department of Electrical and Computer Engineering at the University of Minnesota, acknowledged the contribution Ma and his fellow researchers have made.
“The face-recognition work being carried out by Ma and his colleagues represents a novel take on this problem,” Sapiro said, “and a very refreshing one that we are all looking at with a lot of excitement.”
They aren’t the only ones.
“The past three years has been the most exciting time of my life,” smiles Ma, winner of the Marr Prize in 1999 for the best paper presented during the International Conference on Computer Vision. “It’s the opportunity of a lifetime for researchers with my background.”
Ma, a 37-year-old native of China’s Sichuan province, lists his research interests as high-dimensional data clustering and classification, compression-based image segmentation, robust face recognition via sparse representation, and error correction for high-dimensional signals and matrices.
“My main research interest,” he states on his UIUC faculty website, “is in finding the most pertinent mathematical principles for analyzing and understanding high-dimensional sensorial data, such as images, so that machines and humans can make more intelligent decisions.”
That’s what he’s doing with regard to face recognition, using analytic and algorithmic tools from “sparse representation” to guide a new approach sufficiently robust to handle corruption and occlusion and delivering performance far exceeding not only computer-vision expectations, but also the ability of humans themselves.
Traditionally, computer face recognition is based on such factors as the characteristics of the eye region, the shape of the nose, or the width of the mouth. But Ma and colleagues have determined that a collection of random points—a sparse representation consisting of a couple of hundred points—can provide sufficient data to identify the distinguishing factors from an image and match it to an image in a database. The key is in collecting enough sets to enable a high accuracy rate.
“By studying these high-dimensional signals like images and video,” he explains, “people start to realize there’s something completely new that they didn’t understand. That is thought-provoking. There are some completely new mathematical phenomena that we’re not completely aware of, and if you harness them correctly, it allows you to do magical things.”
Those magical things can properly identify the faces of people wearing sunglasses or a scarf—and those in images cropped beyond recognition.
“This,” Ma says, “has opened up all kinds of new opportunities.”
That, in turn, has piqued interest in this work from the National Science Foundation, which sponsored a project, by Ma and Emmanuel Candès, called Advances in the Theory and Practice of Low-Rank Matrix Recover and Modeling, and the U.S. Department of Homeland Security, to which Ma gave a demonstration in 2009. The accuracy of his algorithm demonstrates promise in improving video and still-image annotation, advertising, and the monitoring and identification of people appearing in public.
The work also has sparked a flurry in research circles to build upon a proven approach.
“We are competing with the best people in the world,” Ma says, “people from Caltech, Stanford, MIT, Princeton, UCLA. Everybody has their eyes on this because it’s such a rare opening. We are tackling some of the hardest problems in the world with this powerful tool.
“Application for the algorithm has just mushroomed. Signal processing, imaging, medical imaging, geology, bioinformatics, coding theory, information theory, control systems … It has been disseminated to almost every engineering field.”
The recent face-recognition advances stem from pioneering work by Russian and American mathematicians after World War II. Problem was, there was never sufficient data to make that mathematical theorizing practical. Now, with a plethora of high-dimensional data available, such work can be applied to real-world scenarios.
Ma got a Ph.D. in computer vision at the University of California, Berkeley, and he has worked for years on image and video segmentation, which naturally led him to analysis of using multiple low-dimensional, linear models to describe complex, high-dimensional data.
“I’ve been studying the statistics and algebra and geometry behind these kinds of models for many years,” he says. “I started learning this new branch of mathematics, this very cool mathematical tool, and it turns out that the most available data is that in various images. That’s where the first marriage between this mathematical tool and face recognition got introduced.
“It works like magic—to a point that we don’t understand why. Existing mathematic theory couldn’t explain it. That really got us intrigued. It turns out that the method works even better than what the theory was predicting at the time. That got us into all the mathematics behind it.”
In fact, Ma says, it might take four or five years to get qualitatively similar results in theoretical examinations to those occurring now in actual use.
His team of students—including Andrew Wagner, Arvind Ganesh, and Zihan Zhou of UIUC; Allen Yang, a research engineer in the Department of Electrical Engineering and Computer Science at the University of California, Berkeley; and John Wright, Ma’s former Ph.D. student at UIUC who now works as a researcher at Microsoft Research Asia—was one of the first to be introduced to the new tools. In 2009, Wright won the $30,000 Lemelson-Illinois Student Prize for creating a prototype application that uses Ma’s algorithm.
“We’re spearheading this effort,” Ma says, “because of our unique application area, computer vision, which helped us discover even more interesting mathematical problems that were not available to others. That put us in a very, very good position.”
One of the hallmarks of Ma’s approach is that it dispenses with all but the most compelling match in a face-recognition attempt.
“You want to use the smallest number of pictures out of your database to explain the new picture you haven’t seen before,” he explains. “If the computer can find that, then those fewest possible pictures selected to represent the new picture give you all the information you need.”
If the resolution of the image is sufficiently large, then the size and shape of elements such as the eyes, nose, and mouth become less important. Instead the totality of the data about the facial image begins to predominate. That leads to remarkable success. Even with 80 percent of a face occluded—with the eyes, nose, and mouth hidden from sight—the new algorithm can provide a match.
“That doesn’t mean the rest of the pixels are useless,” Ma cautions. “They also have tremendous information. But you have so many pixels, and if you capture that information correctly, they can tell you who the person in an image is.”
In one test, including a selection of 50 male and 50 female subjects from the AR Face Database, the new algorithm achieved 100-percent accuracy for the male subjects and 95-percent accuracy for the females, who were wearing sunglasses.
“It’s hard to imagine at first sight,” Ma stipulates. “At first, we submitted a paper to a top research conference and got rejected, because the reviewer just couldn’t believe it could work.”
The stunning success of the approach opened a wealth of new areas of exploration—so many, as it turned out, that the researchers couldn’t keep track of them all.
“I have a tradition,” Ma says, “starting when I was a student with my adviser at Berkeley. We tried to maintain a list of open problems. I kept doing that with my students, as well. John Wright and I used to keep a very good list of open problems, but recently, we stopped, because there are just too many. We just see this vast opening now.”
They’ll need assistance to address all the avenues of opportunity the new algorithm has made available.
“We do have a list of priorities,” Ma says. “We think we can grab the lower-hanging fruits in the years to come, but things have opened up far beyond. We are trained as engineers, not as mathematicians. For us to succeed, we need a lot of serious help from professional mathematicians and other computational professionals.
“Next on our agenda, we want to demonstrate face recognition. You can get engineering that is scalable and fast enough to provide an almost-real-time, robust face-recognition system that works under quite broad, realistic conditions.”
Challenges remain, of course. There are issues with recognizing faces with unfamiliar poses and misalignment, with obtaining sufficient training, and with achieving scalability to large databases. But Ma thinks these difficulties can be addressed with his sparse-representation approach.
“There are still tons of practical and other aspects of face recognition before we can use this core method to make a well-rounded system that works under a variety of conditions,” he says. “There are many other engineering algorithmic challenges to overcome.
“But we are not sure that we have found all the pieces in the jigsaw puzzle yet. We can be cautiously optimistic, but we have to be careful. We don’t have any principal understanding of why things work the way they do.”
Such circumspection is understandable, but it must be hard to achieve amid a period of incredible success. When Ma ponders the potential this work has to offer, he becomes positively effusive.
“Three-D reconstruction, massive-scale image segmentation …” he says. “People can do these things in almost real time now.”
He grants himself a flight of fancy.
“Right now, vision is a bottleneck for artificial intelligence,” Ma says. “We can make wonderful robots—they can dance, they can sing, they can jump—but they’re blind. They cannot interact with people. They cannot recognize anything. Vision could really help in terms of speed and accuracy. Those are the kinds of applications with which our work could make a difference.”
He is quick to note, though, that such continued advances are hardly guaranteed.
“We could hit a wall,” he muses. “Nobody knows right now. This is why it’s exciting and intense and refreshing. Maybe, next month or next year, people will identify another opening with another set of problems that these tools can help address.
“Everybody in this field is starting to realize how limited we were just several years ago. Before, we thought we had done everything, we had thought of everything, we were so clever, we were so smart. But now, we know: No, that’s not necessarily the case.”