A Computer-Vision View of the World

Published

A robot comes up from the subway. How does it know where it is?

To most people, this sounds like the opening line of a joke, but for Jamie Shotton, senior researcher in the Computer Vision Group at Microsoft Research Cambridge, the “lost robot” is a classic problem that he wants to solve through scene recognition and camera localization. From June 23 to 28, Shotton will be joining fellow researchers in Portland, Oregon, for CVPR 2013, where they will be sharing novel approaches to such problems and a multitude of others with academics and other industrial researchers.

“CVPR is the premier annual conference on computer vision,” says Richard Szeliski, a Microsoft distinguished scientist, manager of the Interactive Visual Media Group at Microsoft Research Redmond, and one of four program chairs for CVPR 2013. “As usual, we have a very strong showing from Microsoft vision researchers from our labs worldwide. We’re looking forward to presenting our work to other leading scientists and learning from them about the latest technical advances in our field.”

Spotlight: Microsoft research newsletter

Microsoft Research Newsletter

Stay connected to the research community at Microsoft.

When Szeliski talks about a strong showing from Microsoft Research, he means more than just the number of attendees and the papers being presented. He also is referring to the exceptional talent of researchers such as Shotton.

Getting Behind the Camera

Jamie Shotton

Jamie Shotton

Shotton was the major contributor to the human body-part recognition algorithm used by Kinect for Windows for skeletal tracking. Real-Time Human Pose Recognition in Parts from Single Depth Images, co-authored by a team drawn from Microsoft Research Cambridge and the Xbox Incubation group, was awarded the best-paper prize during CVPR 2011 and the 2011 MacRobert Award from the Royal Academy of Engineering. He was also part of the Xbox and Xbox incubation team that won the 2012 Microsoft Outstanding Technical Achievement award for incorporating the body-part-recognition algorithm into a successful product.

During CVPR 2013, Shotton will be presenting his latest project, Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, which he co-authored with Cambridge colleagues Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Their approach uses machine learning to predict where pixels in an image live in the real, 3-D world and, in doing so, infer the position of the camera when it captured the image.

“While this is a line of research that started with Kinect body tracking,” Shotton explains, “it’s morphed into quite a different thing. It’s about predicting the position of the camera. This paper presents a new, efficient algorithm for estimating the camera position very accurately from a single depth image relative to a known scene or environment.”

This has important implications in robotics. The “lost robot” problem places a robot in a random spot in an environment. When the robot starts, it needs to determine where it is. Shotton’s new approach lets the robot “see” in a way that infers its starting position.

Camera tracks

Camera tracks for one of the test scenes, showing the frame by frame location of the camera as it moves around the room to scan the environment. Red points represent actual camera location. Green points show tracking as derived from a standard approach. Blue points represent the output from the new regression-forest algorithm, showing a smoother, more accurate result.

Augmented reality is another area that could benefit from accurate camera localization: imagine looking at a scene through a smartphone camera and seeing information about the landmarks overlaid on the screen. Shotton also envisions the potential for enhanced navigation.

“For example, when you come up out of the subway, you might be standing at an intersection,” he says. “A GPS can tell you which intersection, but only approximately. So you take a photo and use it to know where you’re standing in the world, without needing any other information. What we have right now is just a stepping stone to the ultimate goal, which is to scale up this solution to unknown, large environments, but the results suggest this approach could scale up quite well.”

Thriving on Diversity

For anyone who thinks that skeletal tracking research must’ve consumed all of Shotton’s time during 2011, one look at his list of publications reveals that he’s interested in almost every area of computer vision. Since 2003, Shotton has authored or co-authored more than 50 research papers, on topics ranging from object recognition to human-pose estimation, gesture and action recognition, machine learning, and medical imaging.

Shotton got a taste of life at Microsoft Research during an internship. That led to a scholarship from Microsoft Research for his Ph.D. studies from 2003 to 2007 and more internships at the Cambridge lab in 2003 and 2005, followed by one at the Redmond lab in 2006—all boding well for a career at Microsoft Research after graduation. Shotton, though, decided to see more of the world first and spent a year working at Toshiba’s research lab in Kawasaki, just outside Tokyo. In 2008, Shotton returned to the United Kingdom and joined Microsoft Research when a post-doc opportunity presented itself.

“Japan was a great experience,” he recalls. “I got exposure to a very different work environment, an interesting culture, a beautiful country, and lots of great snowboarding. I’m an avid snowboarder, but lately I’ve started skiing again and find I’m enjoying that a lot.”

A perfect vacation for Shotton would involve either skiing and snow, or sunshine and salt water. He is also a certified scuba diver.

“And I used to go skydiving,” he says. “That was my hobby for a while. But it takes a lot of time, because you have to be there at the drop zone every weekend for hours on end. In the U.K., this is even more time-consuming, because you’re always hanging around waiting for the weather to clear—not a very practical hobby.”

Does he have any hobbies that don’t involve so much adrenaline?

“Music. I have a fairly strong classical-music background,” Shotton says, “because I learned piano and cello when I was growing up. I sing in a choir: second bass. Our repertoire is mostly classical with some jazzy pieces. We’re a pretty decent ensemble.”

With so many interests, in his personal life and in his research, it’s clear that Shotton gets stimulation from the opportunity to work on a variety of challenges.

“One of the things I enjoy most about working for Microsoft Research is collaborating with the amazing people here,” he says. “It sounds like a cliché, but it’s true. There’s such a buzz, always so much going on, always something exciting. There’s never a dull moment. I love the diverse projects and the diverse approaches to solving problems.”

Continue reading

See all blog posts