Minh Hoai Nguyen, Lorenzo Torresani, Fernando de la Torre, and Carsten Rother
Please find TR below as well.
Visual categorization problems, such as object classification
or action recognition, are increasingly often approached
using a detection strategy: a classifier function
is first applied to candidate subwindows of the image or the
video, and then the maximum classifier score is used for
class decision. Traditionally, the subwindow classifiers are
trained on a large collection of examples manually annotated
with masks or bounding boxes. The reliance on timeconsuming
human labeling effectively limits the application
of these methods to problems involving very few categories.
Furthermore, the human selection of the masks introduces
arbitrary biases (e.g. in terms of window size and location)
which may be suboptimal for classification.
In this paper we propose a novel method for learning
a discriminative subwindow classifier from examples annotated
with binary labels indicating the presence of an object
or action of interest, but not its location. During training,
our approach simultaneously localizes the instances of the
positive class and learns a subwindow SVM to recognize
them. We extend our method to classification of time series
by presenting an algorithm that localizes the most discriminative
set of temporal segments in the signal. We evaluate
our approach on several datasets for object and action
recognition and show that it achieves results similar and in
many cases superior to those obtained with full supervision.