Weakly supervised discriminative localization and classification: a joint learning process

Please find TR below as well.

Visual categorization problems, such as object classification

or action recognition, are increasingly often approached

using a detection strategy: a classifier function

is first applied to candidate subwindows of the image or the

video, and then the maximum classifier score is used for

class decision. Traditionally, the subwindow classifiers are

trained on a large collection of examples manually annotated

with masks or bounding boxes. The reliance on timeconsuming

human labeling effectively limits the application

of these methods to problems involving very few categories.

Furthermore, the human selection of the masks introduces

arbitrary biases (e.g. in terms of window size and location)

which may be suboptimal for classification.

In this paper we propose a novel method for learning

a discriminative subwindow classifier from examples annotated

with binary labels indicating the presence of an object

or action of interest, but not its location. During training,

our approach simultaneously localizes the instances of the

positive class and learns a subwindow SVM to recognize

them. We extend our method to classification of time series

by presenting an algorithm that localizes the most discriminative

set of temporal segments in the signal. We evaluate

our approach on several datasets for object and action

recognition and show that it achieves results similar and in

many cases superior to those obtained with full supervision.

SegSVM_ICCV09.pdf
PDF file
SegSVM_CMU-RI-TR-09-29.pdf
PDF file

In  ICCV

Details

TypeInproceedings
> Publications > Weakly supervised discriminative localization and classification: a joint learning process