Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Semantic Image Segmentation and Web-Supervised Visual Learning

F. Schroff


Given an image, the goal of this work is to recognise objects of certain categories despite intra-class appearance variations and small inter-class differences. The appearance of objects in photographs is influenced by lighting, scale, different poses, viewpoints, articulation of objects, clutter and occlusion. Two different aspects of object recognition are investigated in this thesis. The first part develops models for semantic object segmentation of natural images and relies on groundtruth labelling for training. The second part uses the implicit supervision that is available on web-pages to learn visual object-class models automatically. It can then provide training data for object detection or segmentation algorithms.

The goal in the first part is to label connected regions in an image as belonging to specific object classes, such as grass or cow. We introduce a compact model, where each class is modelled by a single histogram of visual-words, this is in contrast to common nearest-neighbour approaches which model each class by storing exemplar histograms. After introducing segmentation algorithms based on these histogram models we extend the Random Forest classifier and evaluate its feature selection properties for the semantic object segmentation task.

Most object recognition methods rely on labelled training images. For each object category to be recognised, the system is trained on a set of images containing instances of these categories. The second part of this thesis focuses on the automatic creation of sets of images that contain a certain object class. The idea is to download an initial set of images from the Internet based on a search query (e.g., penguin). Given the images a text based ranking that exploits the information on the web-pages is performed. This ranking is then used to automatically learn visual models for object categories. We compare the performance of our system to previous work and show that it performs equally well without the need of explicit manual supervision.


Publication typePhdThesis
InstitutionUniversity of Oxford
> Publications > Semantic Image Segmentation and Web-Supervised Visual Learning