Learning the Discriminative Power-Invariance Trade-Off
3. Applications
We apply our method to the UIUC
textures, Oxford flowers
and Caltech 101
and 256
object categorisation databases. Since we would like to test how
general the technique is, we assume that no prior knowledge is
available and that no descriptor is a priori preferable to any
other. We therefore set σk to be constant for
all k and do not make use of the constraints Ad ≥
p (unless otherwise stated). The only parameters left to be set
are C, the misclassification penalty, and the kernel parameters
γk. These parameters are not tweaked. Instead,
C is set to 1000 for all classifiers and databases and
γk is set to one over the mean of the
kth distances over the training set for the given
pairwise classification task. Note that the kernel parameters could
instead have been learnt by creating many base kernels, each with a
different value of γk, and then seeing which
ones gets selected. It is also possible to analogously learn
1/C in an l2 SVM setting.
We compare our algorithm to the Multiple Kernel Learning Block
l1 regularisation method of [Bach et al. NIPS 2004] for which
code is publicly
available. All experimental results are calculated over 20 random
train/test splits of the data except for 1-vs-All results which are
calculated over 3 splits.
3.1. UIUC Textures
The UIUC
texture database (see Fig. 1) has 25 classes and 40 images per
class. The database contains materials imaged under significant
viewpoint variations and also contains fabrics which display folds and
have non-rigid surface deformations. A priori, it is hard to
tell what is the right level of invariance for this database. Affine
invariance is probably helpful given the significant viewpoint
changes. Higher levels of invariance might also be needed to
characterise fabrics and handle non-affine deformations. However, [Zhang et al. IJCV 2007] concluded
that similarity invariance is better than either scale or affine
invariance for this database. Then again, our results indicate that
even better performance can be obtained by sticking to rotationally
invariant descriptors. This reinforces the observation that it is not
always straight forward to pinpoint the required level of invariance.
|
|
Figure 1. One image each from all the 25 materials present in the
UIUC database as well as some sample images from 4 different texture
classes. The images exhibit large variations due to camera
viewpoint changes as well as non-rigid fabric deformations.
|
For this database, we start with a standard patch descriptor having no
invariance but then take different transforms to derive 7 other base
descriptors achieving different levels of the trade-off. The first
descriptor is obtained by linearly projecting the patch onto the MR filters. Subsequent rotation, scale
and similarity invariant descriptors are obtained by taking the
maximum response of a basic filter over orientation, scale or
both. This is similar to HMAX where the
maximum response is taken over position to achieve translation
invariance. MR filter responses can also be used to derive fractal based bi-Lipschitz (including
affine, perspective and non-rigid surface deformations) invariant and
rotation invariant descriptors. Finally, patches can directly yield
rotation invariant descriptors by aligning them according to their
dominant orientation.
|
|
Figure 2. 1-vs-1 weights learnt on the UIUC database: Both class 23 and class
3 exhibit significant variation. As a result, bi-Lipschitz
invariance gets a very high weight when distinguishing between these
two classes while all the other weights are 0. However class 7 is
simpler and the main source of variability is rotation. Thus, full
bi-Lipschitz invariance is no longer needed when distinguishing
between class 23 and class 7. It can therefore be traded-off with a
more discriminative descriptor. This is reflected in the learnt
weights where rotation invariance gets a high weight of 1.46 while
bi-Lipschitz invariance gets a small weight of 0.22. Bi-Lipschitz
invariance isn't set to 0 as class 23 would start getting
misclassified. However, if class 23 were replaced with the simpler
class 4, which primarily has rotations, then bi-Lipschitz invariance
is no longer necessary. Thus, when distinguishing class 7 from class
4, rotation invariance is the only feature used.
|
|
|
Figure 3. On the left are images from classes 10 and 25 and the
variation in learnt weights as the training set size is increased
for this pairwise classification task. A similar plot for classes 8
and 15 is shown on the right. When the training set size is small, a
higher level of invariance (bi-Lipschitz) is needed. As the training
set size grows, a less invariant and more discriminative descriptor
(similarity) is preferred and automatically learnt by our
method. The trends, though not identical, are similar in both (a)
and (b) indicating that the tasks could be related. Inspecting the
two class pairs indicates that while they are visually distinct,
they do share the same types of variations (apart from the fabric
crumpling).
|
For classification, the testing methodology is kept the same as in [Zhang et al. IJCV 2007] -- 20
images per class are used for training and the other 20 for
testing. Table 1 lists the classification results. Our results are
comparable to the 98.70 ± 0.4% achieved by the
state-of-the-art. What is interesting is that our performance has not
decreased below that of any single descriptor despite the inclusion of
specialised descriptors having scale and no invariance. These
descriptors have poor performance in general. However, our method
automatically sets their weights to zero most of the time and uses
them only when they are beneficial for classification. Had the equally
weighted combination scheme of Zhang et al. been used, these
descriptors would have been brought into play all the time and the
resulting accuracy drops down to 96.79 ± 0.86%. In each of the
20 train/test splits, learning the descriptors using our method
outperformed equally weighted combinations (as well as the MKL-Block
l1 method). Figure 2 shows how the learnt weights
correspond visually to the trade-offs between different classes while
Figure 3 shows that the learnt weights change sensibly as the training
set size is varied.
|
|
Table 1. Classification results on the UIUC texture database.
|