Learning the Discriminative Power-Invariance Trade-Off


IntroductionMethodApplications

3. Applications

We apply our method to the UIUC textures, Oxford flowers and Caltech 101 and 256 object categorisation databases. Since we would like to test how general the technique is, we assume that no prior knowledge is available and that no descriptor is a priori preferable to any other. We therefore set σk to be constant for all k and do not make use of the constraints Adp (unless otherwise stated). The only parameters left to be set are C, the misclassification penalty, and the kernel parameters γk. These parameters are not tweaked. Instead, C is set to 1000 for all classifiers and databases and γk is set to one over the mean of the kth distances over the training set for the given pairwise classification task. Note that the kernel parameters could instead have been learnt by creating many base kernels, each with a different value of γk, and then seeing which ones gets selected. It is also possible to analogously learn 1/C in an l2 SVM setting.

We compare our algorithm to the Multiple Kernel Learning Block l1 regularisation method of [Bach et al. NIPS 2004] for which code is publicly available. All experimental results are calculated over 20 random train/test splits of the data except for 1-vs-All results which are calculated over 3 splits.


3.1. UIUC Textures3.2. Oxford Flowers3.3. Caltech 1013.4. Caltech 256

3.1. UIUC Textures

The UIUC texture database (see Fig. 1) has 25 classes and 40 images per class. The database contains materials imaged under significant viewpoint variations and also contains fabrics which display folds and have non-rigid surface deformations. A priori, it is hard to tell what is the right level of invariance for this database. Affine invariance is probably helpful given the significant viewpoint changes. Higher levels of invariance might also be needed to characterise fabrics and handle non-affine deformations. However, [Zhang et al. IJCV 2007] concluded that similarity invariance is better than either scale or affine invariance for this database. Then again, our results indicate that even better performance can be obtained by sticking to rotationally invariant descriptors. This reinforces the observation that it is not always straight forward to pinpoint the required level of invariance.

Figure 1. One image each from all the 25 materials present in the UIUC database as well as some sample images from 4 different texture classes. The images exhibit large variations due to camera viewpoint changes as well as non-rigid fabric deformations.

For this database, we start with a standard patch descriptor having no invariance but then take different transforms to derive 7 other base descriptors achieving different levels of the trade-off. The first descriptor is obtained by linearly projecting the patch onto the MR filters. Subsequent rotation, scale and similarity invariant descriptors are obtained by taking the maximum response of a basic filter over orientation, scale or both. This is similar to HMAX where the maximum response is taken over position to achieve translation invariance. MR filter responses can also be used to derive fractal based bi-Lipschitz (including affine, perspective and non-rigid surface deformations) invariant and rotation invariant descriptors. Finally, patches can directly yield rotation invariant descriptors by aligning them according to their dominant orientation.

Figure 2. 1-vs-1 weights learnt on the UIUC database: Both class 23 and class 3 exhibit significant variation. As a result, bi-Lipschitz invariance gets a very high weight when distinguishing between these two classes while all the other weights are 0. However class 7 is simpler and the main source of variability is rotation. Thus, full bi-Lipschitz invariance is no longer needed when distinguishing between class 23 and class 7. It can therefore be traded-off with a more discriminative descriptor. This is reflected in the learnt weights where rotation invariance gets a high weight of 1.46 while bi-Lipschitz invariance gets a small weight of 0.22. Bi-Lipschitz invariance isn't set to 0 as class 23 would start getting misclassified. However, if class 23 were replaced with the simpler class 4, which primarily has rotations, then bi-Lipschitz invariance is no longer necessary. Thus, when distinguishing class 7 from class 4, rotation invariance is the only feature used.
Figure 3. On the left are images from classes 10 and 25 and the variation in learnt weights as the training set size is increased for this pairwise classification task. A similar plot for classes 8 and 15 is shown on the right. When the training set size is small, a higher level of invariance (bi-Lipschitz) is needed. As the training set size grows, a less invariant and more discriminative descriptor (similarity) is preferred and automatically learnt by our method. The trends, though not identical, are similar in both (a) and (b) indicating that the tasks could be related. Inspecting the two class pairs indicates that while they are visually distinct, they do share the same types of variations (apart from the fabric crumpling).

For classification, the testing methodology is kept the same as in [Zhang et al. IJCV 2007] -- 20 images per class are used for training and the other 20 for testing. Table 1 lists the classification results. Our results are comparable to the 98.70 ± 0.4% achieved by the state-of-the-art. What is interesting is that our performance has not decreased below that of any single descriptor despite the inclusion of specialised descriptors having scale and no invariance. These descriptors have poor performance in general. However, our method automatically sets their weights to zero most of the time and uses them only when they are beneficial for classification. Had the equally weighted combination scheme of Zhang et al. been used, these descriptors would have been brought into play all the time and the resulting accuracy drops down to 96.79 ± 0.86%. In each of the 20 train/test splits, learning the descriptors using our method outperformed equally weighted combinations (as well as the MKL-Block l1 method). Figure 2 shows how the learnt weights correspond visually to the trade-offs between different classes while Figure 3 shows that the learnt weights change sensibly as the training set size is varied.

Table 1. Classification results on the UIUC texture database.
[3/6]

« IntroductionMethodApplications »