Learning through abstract scenes





This project explores the use of abstract scenes created from clip art to:

Relating visual information to its linguistic semantic meaning remains an open and challenging area of research. The semantic meaning of images depends on the presence of objects, their attributes and their relations to other objects. But precisely characterizing this dependence requires extracting complex visual information from an image, which is in general a difficult and yet unsolved problem. We propose studying semantic information in abstract images created from collections of clip art. Abstract scenes allow for the direct study of how to infer high-level semantic information, since they remove the reliance on noisy low-level object, attribute and relation detectors, or the tedious hand-labeling of images. Similarly, common sense knowledge may be directly learned by observing these abstract scenes and analyzing the relations of the objects, both spatially and temporally.





Larry Zitnick (Microsoft Research)

Devi Parikh (Virginia Tech)

Lucy Vanderwende (Microsoft Research)


Abstract Scenes Datasets



Abstract Scenes Dataset v1                         Average Scenes                       

Version 1.1 - Released February 2014

Contains data from both the CVPR 2013 and ICCV 2013 papers.

[Readme] [Download] [Demo Javascript] [Example classes][Average Scenes]




Bringing Semantics Into Focus Using Visual Abstraction

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013 (Oral)

C. L. Zitnick and D. Parikh

[slides] [CVPR talk (video)][Dataset]      [SUN workshop slides] [talk (video) MSR Faculty Summit]


Learning the Visual Interpretation of Sentences

IEEE International Conference on Computer Vision (ICCV), 2013

C. L. Zitnick, D. Parikh, L. Vanderwende

[Supplementary material] [Dataset]