ASIRRA -- Public Corpus

To help computer vision researchers who may be interested in trying to "crack" Asirra, we have made a corpus of 30,000 labelled images of cats and dogs from our database available to the public.

The images are in a tar file which has two subdirectories -- one called Cat and one called Dog.  Within each subdirectory, there are 15,000 images, each in JPEG format.

This set of images is representative of the images used by the Asirra CAPTCHA, which come from Petfinder.com.  However, our corpus is slightly biased relative to random images you might find by going directly to Petfinder.  Asirra's backend retrieves all of Petfinder's images, then filters out images that we consider unusable -- for example, images that are below a certain resolution, have an aspect ratio that differs too much from 1, or depict animals other than cats or dogs.  The corpus we're offering is a random, unbiased sample of the images that have passed our acceptance criteria.

These images have been published by Microsoft Research for the express purpose of furthering academic research. They may be used for non-commercial research purposes, but they may not be re-published without the express permission of the copyright owner, Petfinder.com.

Download link (warning -- file is 885MB): ftp://ftp.research.microsoft.com/pub/asirra/petimages.tar