By Hui Ma, China Internet Weekly
July 20, 2009 3:00 PM PT
“This is the first truly large-scale commercial search that is based on image content. Content-based image search has been a focus of research for years, but no one had ever deployed it on as large a scale as images on the Internet.” It was not until the end of my interview with Jian Sun that this Lead Researcher of the Visual Computing Group at Microsoft Research Asia summed up the revolutionary changes "Show Similar Images" has brought to Microsoft's Bing Image Search.
Internet search in the past was never Microsoft’s strength. Google owes its success to keyword search, one of the most seductive cash cows in the Internet economy. In an era when fewer people bother to “read” anything except for images, Microsoft is poised to smash Google's confidence with its image search technology. The “law of the jungle” that Charles Darwin stipulated in his “Origin of Species” also applies to IT ecology: “The species that survive are not the most robust or intellectual, but are those who adapt the quickest to changes.”
The image search technology of Microsoft Research Asia falls back on the essence of computing, and attempts to improve user experience with a new approach.
At Microsoft Research Asia, the only constant element of innovation is change. The miracles derived from technological improvement are changing the habits of tens of thousands of users.
When working on a PhD at the Institute for Pattern Recognition and Artificial Intelligence, Sun was exposed to studies similar to computer vision. Sun joined Microsoft Research Asia in July 2003 and is currently engaged in research projects related to interactive computer vision and internet computer vision. “Computer vision, in essence, is to tell a computer how to recognize objects.”
Image search engines available on the market are somewhat confusing either in that pictures are difficult to describe, or because keywords are too simple or ambiguous, both of which lead to haphazard search results.
There are two basic approaches to image search: computer vision-based, and plain text-based. The classic way of thinking in image research goes like this: with text-based search engines, the sooner the user leaves the results page, the more accurate the results likely are; with an image-based search engine, it’s the opposite – the sooner the user leaves the more likely it is that she hasn’t found what she is looking for.
“A picture is worth a thousand words” is an understatement. What if there is a way to fine-tune the search results?
“Content-based image search engines do not perform ideally as they slow down due to massive data processing; text-based engines, for their part, cannot support longer or multiple keywords, and often come back with images either irrelevant or containing misinterpreted messages,” Sun explained. “Microsoft took the third path that combines the two approaches: perform a text-based search first, and re-rank the results according to their similarity, which makes it easier to find desired results.”
Re-ranking returned image search results with a non-text-based query method -- the "Show Similar Image" tool created by Microsoft Research Asia -- allows users to do this: he or she can point to any item from preliminary results from a text-based search, and re-submit a request. Re-ranking of search results based on visual similarity to a designated image can be accomplished with a single click of the mouse.
“At Microsoft Research Asia, research projects are more often than not built upon researchers’ gut feelings, but most of them fail to yield marketable products,” said Sun, adding, “That is the nature of scientific research.” According to him, research projects should be product-oriented, and the researchers have to take a pragmatic approach to choosing, from a variety of possibilities, the functions in the final product.
Fang Wen, a Researcher in the Visual Computing Group of Microsoft Research Asia, came up with the idea in July 2007 to “find pictures with a picture.” “At that time we only expected to deploy the technology in desktop searches for family photos instead of any online application,” said Wen. “Later on, we accidentally noticed that the text itself, based on which online searches are done, contains a lot of semantic ambiguity, so we tried out ‘Show Similar Image’ on-line. The results were surprisingly beyond our expectations.”
Together with her colleagues, Wen developed a prototype in early 2008 and presented it to the Image Search Product Team. “They fell in love with it at first sight, saying it was the right feature they were after. Then, we continued with discussion on productization issues.” In October 2008, after overcoming many difficulties including complexity in computation and integration with existing products, “Show Similar Image” became an integral part of Microsoft Live Image Search service.
In September 2006, the company announced the formal launch of the Chinese version of Live Search (Beta). In July 2007, the Microsoft Live Search team introduced three new features of Live Image Search, allowing users to find images of faces, portraits, and black and white images. “Show Similar Image” represents a brand new approach to image search, and propels Microsoft's Image Search to a new stage of development.
So how does one define an effective visual similarity? How are visual features efficiently extracted for use on a web-scale image search engine? These two issues turned out to be the major challenges of “Show Similar Image.”
Sun and Wen’s team placed user-selected images into five categories: general objects, objects with a simple background, scenery images, portraits, and people. These categories are subject to different combinations of visual similarity indicators, which naturally yield better results than fixed combinations. The computer “looks for” visual features (such as faces, textures, edges, colors, and spatial allocations) and saves post-classification features in its database for future computing and sorting of similarities.
“The key is to understand what a user is actually searching for. If, for instance, the system can tell that a user intends to search for facial images, and such images fall into the category of “portraits,” then a facial recognition algorithm would be more effective than a general texture classification algorithm,” said Sun.
Search engines of this type enable users to simply and quickly refine and filter preliminary search results, using image-based rather than text-based queries for more accurate and satisfying results. “Image similarity itself is also an open issue in basic research and still commands a great deal of effort in basic research for its improvement. Better user experience is not possible without progress in basic research,” said Sun, adding that there is no such a thing as a “technological threshold,” and that there is a higher cost to maintaining leadership.