Providing Richer Descriptions for Images

Object recognition is now becoming a usable technology. When it is used in applications, fundamental questions arise: What should we recognize in images? What are the desirable outcomes of a recognition system? What should we say if we encounter an unfamiliar object? …
In this talk I focus on representational architectures that enable us to provide deeper and richer descriptions for images. These descriptions are in forms of properties of objects, their functions, and complex relationships between entities in images. I introduce visual attributes and show the benefits of adopting an attribute-centric framework in describing familiar and unfamiliar objects. I then explain a nonparametric approach that provides concise image descriptions in form of natural language sentences. This method uses the predictions of all objects, actions, and scenes to establish a scoring function between an image and a sentence. To enhance image descriptions, I introduce visual phrases; chunks of meanings bigger than objects and smaller than scenes. Further, I show how learning visual phrases directly helps recognition significantly. Finally, I explain a decoding algorithm that decides on the final outcome of our recognition system using predictions of objects and visual phrases.

Speaker Details

Ali Farhadi is a PhD candidate in computer science department at the University of Illinois at Urbana-Champaign. His work, under the supervision of David Forsyth, is mainly focused on computer vision and machine learning. More specifically, he is interested in transfer learning and its application to aspect issues in human activity and object recognition, scene understanding, and attribute based representation of objects. Ali has been awarded the inaugural Google fellowship in computer vision and image interpretation, the university of Illinois fellowship, and Beckman CS/AI 2009 award.

Date:
Speakers:
Ali Farhadi
    • Portrait of Jeff Running

      Jeff Running