We propose a max-margin formulation for the multi-label classification
problem where the goal is to tag a data point with a set of
pre-specified labels. Given a set of $L$ labels, a data point can be
tagged with any of the $2^L$ possible subsets. The main challenge
therefore lies in optimising over this exponentially large label space
subject to label correlations.
Existing solutions take either of two approaches. The first assumes,
{\it a priori}, that there are no label correlations and independently
trains a classifier for each label (as is done in the 1-vs-All
heuristic). This reduces the problem complexity from exponential to
linear and such methods can scale to large problems. The second
approach explicitly models correlations by pairwise label
interactions. However, the complexity remains exponential unless one
assumes that label correlations are sparse. Furthermore, the learnt
correlations reflect the training set biases.
We take a middle approach that assumes labels are correlated but does
not incorporate pairwise label terms in the prediction function. We
show that the complexity can still be reduced from exponential to
linear while modelling dense pairwise label correlations. By
incorporating correlation priors we can overcome training set biases
and improve prediction accuracy. We provide a principled
interpretation of the 1-vs-All method and show that it arises as a
special case of our formulation. We also develop efficient
optimisation algorithms that can be orders of magnitude faster than
the state-of-the-art.