Florian Schroff, Antonio Criminisi, and Andrew Zissermann
The objective of this work is to automatically generate a large number of images for a specified object class. A multi-modal approach employing both text, meta data and visual features is used to gather many high-quality images from the web. Candidate images are obtained by a text based web search querying on the object identifier (e.g. the word penguin). The web pages and the images they contain are downloaded. The task is then to remove irrelevant images and re-rank the remainder. First, the images are re-ranked based on the text surrounding the image and meta data features. A number of methods are compared for this re-ranking. Second, the top-ranked images are used as (noisy) training data and a SVM visual classifier is learnt to improve the ranking further.
We investigate the sensitivity of the cross-validation procedure to this noisy training data. The principal novelty of the overall method is in combining text/meta-data and visual features in order to achieve a completely automatic ranking of the images. Examples are given for a selection of animals, vehicles and other classes totalling 18 classes. The results are assessed by precision/recall curves on ground truth annotated data and by comparison to previous approaches including those of Berg et al.  and Fergus et al. .
In IEEE PAMI
G. Florian Schroff, Andrew Zisserman, and Antonio Criminisi. Harvesting Images Databases from the Web, 2007.