As images and video continue to proliferate across the Web, it becomes ever more vital for research to continue apace in order to bring even greater flexibility and functionality to users worldwide.
Microsoft Research recognizes the need for further innovation in this regard, and that recognition is reflected in numerous contributions to CVPR 2007, the annual IEEE Computer Society Conference on Computer Vision and Pattern Recognition, to be held June 18-23 in Minneapolis. Consider:
Microsoft has collaborated openly with the academic research community to drive innovation in computer vision and pattern recognition for several years, working with university and industry partners to publish and review papers, participating on committees, and presenting inventive work.
This year, Microsoft has contributed nearly 12 percent of the 60 oral papers to be presented during CVPR 2007, up from the 9 percent company figure for the 2006 conference. The collective contribution is indicative of the importance Microsoft places on such research.
“The value of this research to the company,” says Simon Baker, a senior researcher at Microsoft Research Redmond, “is perhaps best highlighted by the wide range of applications covered. These range from noise removal for digital images and techniques for building 3-D models used in PhotoSynth™ to handwritten signature verification and core components of image recognition and search.”
The work Microsoft is pursuing to improve the user experience for image and video enthusiasts is hardly limited to one particular group or location. Of the seven Microsoft papers accepted for oral presentations during CVPR this year, five come from Microsoft Research, representing three of its five labs worldwide, while one is from Live Labs, and one from Virtual Earth™. Four of the Microsoft Research papers represent collaborations, with universities from the United States, the United Kingdom, Russia, Japan, and China.
Further underscoring the company’s commitment to such work is the fact that five chairs for the 2007 conference are held by representatives from Microsoft Research, which is serving as one of the event’s corporate sponsors. Baker will serve as a program chair, and Andrew Fitzgibbon of Microsoft Research Cambridge, Sing Bing Kang of Microsoft Research Redmond, Xiaoou Tang of Microsoft Research Asia, and David Nister of Live Labs as area chairs.
FASTER AND BETTER
One problem common to computer-vision research is the need to optimize the techniques used to perform such tasks as diagram recognition, image segmentation, and stitching images to a panorama. Many of these issues can be formulated in terms of Markov random fields (MRFs), models of joint probability distribution of a set of random variables.
Carsten Rother, a researcher from Microsoft Research Cambridge, co-authored a CVPR 2007 paper called Optimizing Binary MRFs via Extended Roof Duality, along with Vladimir Kolmogorov of University College London, Victor Lempitsky of Moscow State University, and Martin Szummer, also a researcher at Microsoft Research Cambridge.
“We’re tackling optimization issues that are fundamental for a large number of computer-vision problems,” Rother says, “including diagram recognition, image segmentation, new view synthesis, super-resolution, texture restoration, image deconvolution, panoramic stitching, and parameter estimation. Indeed, we show that the new optimization techniques outperform state-of-the-art methods for many test cases of these applications.
“In particular, we applied them to the problem of categorizing ink strokes on a Tablet PC to belonging to either a container or a connector. On all test cases, 2,700 in total, we found that our technique gave, in very fast time, the global optimal solution.”
The latter is particularly vital to achieving rapid implementation of complex image analysis.
“The novel aspects,” Rother says of the work performed by him and his colleagues, “are that, for some applications, we are able to find quickly the global optimal solution, and if we do not find the global optimal, we very often are able to improve the solution of any given method in a short time.
“This was not possible in the past.”
Rother hopes the work achieved thus far can be extended to improve other computer-vision challenges. What’s next?
“Applying it to other fields,” he says, “such as stereo reconstruction—finding the depth map from two cameras. In this work, we solve an optimization problem with binary variables. We plan to apply this method to solve multi-label problems. The new methods will, hopefully, enable us to achieve this in a new and better way.”
That’s research: As soon as you can confirm the efficacy of one hypothetical approach, it’s time to push forward toward the next challenge.
“We hope,” Rother says, “that this theoretical work will find many new applications where it can improve on existing results.”
The ease with which digital photographs can be manipulated, stored, and shared makes them seem almost magical to amateur shutterbugs. But digital cameras remain prone to the same sorts of flaws inherent to machines of any stripe, and one of the more vexing is that the photographs such cameras produce can be degraded by dust that accumulates on the camera’s sensor, particularly on single-lens reflex cameras with interchangeable lenses.
Changyin Zhou of Shanghai’s Fudan University and Steve Lin, a lead researcher for Microsoft Research Asia, have a plan to change that. In their CVPR 2007 paper, Removal of Image Artifacts Due to Sensor Dust, they offer a technique to remove dust effects from digital photographs automatically.
“Currently,” Lin says, “these dust effects in photographs must be removed by hand using software. Often, it requires a skillful user to remove these artifacts effectively. We provide an automatic tool for dealing with this problem.”
Manual processing of dust-affected images can work when the background is uniform or highly regular, but a dust background often does not have such regularity. Current techniques to address the issue involve a cloning-style process to simulate regions obscured by dust on a camera sensor. But irregular backgrounds provide difficulties for those techniques.
The key idea of our method,” Lin explains, “is to model the physical formation of dust artifacts and use this model to obtain results superior to those of general artifact-removal techniques.”
Zhou, currently working as an intern at Microsoft Research Asia, and Lin use a model of the artifact formation due to sensor dust and combine it with contextual information in the image and a color-consistency constraint on dust to remove the obscured visual information. They have found that when multiple images are available from the same camera, even under varying settings, their approach also can be utilized to detect dust regions on the camera sensor.
“It is the first technique that specifically addresses the problem,” Lin states, “which is a problem that is expected to grow with the increasing popularity of digital single-lens reflex cameras.”
It is possible that the technique devised by Zhou and Lin could prove useful in other settings, such as scanned images diminished by fingerprints or smudges.
“We are seeking other problems,” Lin concludes, “in digital-photograph enhancement where we can apply this approach.”
TRAVELING IN TIME
Cities evolve. It might be difficult to track that evolution while you’re immersed in one, but return to a city you used to know but haven’t visited in years. You’ll recognize it, sure, but you’ll also be astounded at the changes you notice.
What if you could amass a lot of photos and explore the city by leafing through them? And what if the photos were organized by the date they were taken? All of a sudden, you’d be able to explore in space and in time. It’d be a sort of virtual time travel.
Enter Kang, a senior researcher at Microsoft Research Redmond. Kang and two colleagues—Grant Schindler, a Ph.D. student at the Georgia Institute of Technology, and his professor and adviser, Frank Dellaert—have developed a technique to order chronologically a large set of photographs by analyzing the presence of structures visible in the photographs. Their work will be presented during CVPR 2007 in the form of a paper entitled Inferring Temporal Order of Images from 3D Structure.
“Given a large collection of historical photographs of a city that spans decades or perhaps even a century,” Kang says, “how can we automatically sort the photographs by time? We attacked the problem by reasoning about structures that are visible in photographs. This is made possible by assuming that each structure is unique and exists in a continuous block of time.”
Kang and colleagues have cast the challenge as a constraint-satisfaction problem, which enables them to achieve a time-ordered collection of photographs even with a huge number of images and with images in which certain structures are partially blocked.
“Being able to sort photographs in time,” Kang says, “would allow us to construct time-varying 3-D models from collections of historical photographs. The user would then have the opportunity to navigate not just in space, but also in time.
“Imagine being able to virtually turn back the clock and visualize a city as it had appeared in the past. One can call such an experience virtual time-traveling. We also imagine such a tool would be useful to historians and urban planners.”
The work by Schindler, Dellaert, and Kang builds off traditional structure-from-motion techniques, well established in computer-vision research to solve spatial problems. By combining those techniques with the constraint-satisfaction-problem approach, the research is able to infer the temporal ordering for the images and a range of dates for which each structured element in a scene persists.
The project began at Georgia Tech, as part of the 4D Cities project headed by Dellaert, and used historical images from the Atlanta History Center.
“For the first time,” Kang says, “we have shown that it is possible to sort photographs by time by reasoning about the existence of 3-D structures.
“There are a number of methods that allow the user to construct and visualize an environment from photographs, either through model reconstruction, as in Virtual Earth, or spatially located photographs, as in Live Labs’ PhotoSynth. However, our technique is the first to automatically sort photos by time, with the eventual goal of being able to construct and visualize large urban environments in space and time using appropriate user-interface controls.”
In addition to these three projects, Microsoft Research will be represented during CVPR 2007 by accepted papers Learning Local Image Descriptors, by Simon A.J. Winder and Matthew Brown of Microsoft Research Redmond, and Offline Signature Verification Using Online Handwriting Registration, co-written by Tang.Perhaps, someday, people will be able to return to Minneapolis, circa 2007, to view the city as it appeared when the stimulating work featured in this year’s CVPR was first revealed to the world.