Understanding Visual Scenes: Where are We?

Speaker  Jianxiong Xiao

Affiliation  Massachusetts Institute of Technology (MIT)

Host  Larry Zitnick

Duration  00:36:57

Date recorded  30 August 2011

Human visual scene understanding is remarkable: with only a brief glance at an image, an abundance of information is available, including scene category, 3D spatial structure, and the identity of the main objects in the scene. In the last two decades, there has been much progress towards building computer systems that have general visual perception ability. In the first part of the talk, I will present results of scene recognition on a new database with an exhaustive set of scene categories. When hundreds of categories become available, for the first time, we can test the performance of global features to classify scenes into categories covering most of the places encountered by humans. We evaluate numerous state-of-the-art algorithms for scene recognition, establish new bounds of computer performance, and compare them with human performance. In the second part of the talk, I will show that scene understanding seems to be mature enough for real world application in certain domains. I will demonstrate results on semantic segmentation of street-view images into buildings, trees, etc. I will also showcase several possible applications for scene understanding, including 3D reconstruction of building mesh models, prediction of how memorable an image is, and extrapolation of an image beyond its boundaries. This is joint work with Antonio Torralba, Long Quan, Aude Oliva, James Hays, Krista Ehinger and Phillip Isola.

©2011 Microsoft Corporation. All rights reserved.
> Understanding Visual Scenes: Where are We?