We introduce a new dataset of human annotations of objects, parts, attributes and activities in images. The purpose of this annotation effort is to approximate gold standard visual recognition, and to enable the study of what visual information is required in downstream tasks such as image-to-text generation. This annotation was gathered using Amazon Mechanical Turk and consists of 4,000 object instances and 100,000 textual labels annotated on 500 images.
Note By installing, copying, or otherwise using this software, you agree to be bound by the terms of its license. Read the license.