Illustration (a) of the position of edge foci (red dots) relative to edges (blue line). Grey ellipses show the area of positive response for the orientation dependent filters. Peak aggregated filter responses for a small scale (b) and large scale (c).

[ICCV 2011
Paper][Download the Image
Datasets] [Download
Supplementary Material]

[Download
Executable (.exe)]

**Abstract**

**Motivation**

Identifying local features is a critical component to many approaches in object recognition, object detection, image matching and 3D reconstruction. In each of these scenarios, a common approach is to use interest point detectors to estimate a reduced set of local image regions that are invariant to occlusion, orientation, illumination and viewpoint changes. The interest point operator defines these regions by their spatial locations, orientations, scales and possibly affine transformations. Descriptors are then computed from these image regions to find reliable image-to-image or image-to-model matches. It is desirable that a good interest point detector has the following three properties: (1) The interest points are repeatable, (2) the descriptors produced from them are unique and (3) they are well-distributed spatially and across scales.

Most of the interest point detection approaches perform a series of linear filtering operations on the image’s intensities to detect interest point positions. However, filtering intensities directly can result in reduced repeatability under non-linear lighting variations that commonly occur in real world scenarios. Furthermore, when detecting objects in a scene, changes in the background will also result in non-linear intensity variations along object boundaries, resulting in a similar reduction in repeatability.

**Proposal**

We propose detecting interest points using edge foci. We define the set of edge focus points or edge foci, as the set of points that lie roughly equidistant to a set of edges with orientations perpendicular to the point. The detection of edge foci is computed from normalized edge magnitudes, and is not directly dependent on the image’s intensities or absolute gradient magnitudes. Compared to image intensities, we hypothesize that the presence of edges and their orientations is more robust to non-linear lighting variations and background clutter. Edge foci are detected by applying different filters perpendicular and parallel to an edge.The filter parallel to the edge determines the edge’s scale using a Laplacian of a Gaussian. The second filter blurs the response perpendicular to the edge centered on the predicted positions of the foci. Aggregating the responses from multiple edges results in peaks at edge foci.

Flow diagram of the detector: (a) input image, (b) normalized gradient, (c) normalized gradients separated into orientations, (d) responses after applying oriented filter, (e) the aggregated results and (f) detected interest point.

**Experiments and
Results**

**
Qualitative Results:**

Filter response on four images (a) for the edge foci detector (b) and Laplacian detector (c).

We show the response of our detector in comparison with the Laplace detector on a set of toy examples in the figure above, to provide additional insight . Notice how the filter responses are more consistent for our edge based descriptor as compared to the intensity based Laplacian detector. The edge foci detector finds interest points at the correct scale and position on all four circles, where the Laplacian fails on the right two.

Detection results for different style fonts. Edge Foci is the most repeatable.

**
**Since our detector is only dependent on normalized edge
magnitudes, it is possible to find repeatable interest points on line
drawings of images. This may be useful for detecting signs with different
font styles as shown in the figure above. Notice none of the other detectors
produce repeatable interest points when the font changes style.

We show experimental results to illustrate the performance of edge foci interest points based on three different metrics: First, we provide an entropy measure to study the distribution both spatially and across scales of the interest points. Second, we score the interest points’ repeatability, i.e. whether corresponding regions are chosen between images. Finally, we measure the uniqueness of the descriptors computed by the interest points to estimate the amount of ambiguity present during matching. We also evaluate the detectors on image alignment and retrieval tasks.

We compare the performance of our detector against some of the most commonly used detectors such as the Harris, Hessian, Harris/Hessian Laplace, MSER and DoG detectors on a set of new datasets that capture non-linear illumination variations and changes in background clutter. Homographies are computed using ten or more hand labeled points to provide correspondences between pairs of images. We assume the scenes are either planar or taken at a far distance so correspondences can be well approximated by a homography. Figure above shows two example images from the 8 datasets used in this paper.

**
Entropy: **We measure the distribution of the interest points in
scale space based on their entropy. Intuitively, the positions of well
distributed detectors should have a higher entropy than detectors with
overlapping interest points. We compute entropy by discretizing positions
and scales.

Entropy of interest point detectors across various datasets.

**
Repeatability:** The repeatability criterion measures the percentage of
interest points that are detected at the same relative positions and scales
across images. Considered in isolation, the repeatability score can be biased
towards detectors that find overlapping interest points. That is, poor
localization can be mitigated by the detection of redundant interest points. In
practice this is undesirable for two reasons: First, redundant interest points
require more storage, and increased computation for matching. Second, the
descriptors corresponding to the interest points will not be unique, increasing
the difficultly of matching and nearest neighbor techniques.We modify the
traditional measure of repeatability to additionally penalize overlapping
detections.

Graphs showing the repeatability with penalized overlap for 8 datasets using 7 different detectors.

**
Uniqueness:** While the repeatability measure describes how often we can
match the same regions in two images, it does not measure the uniqueness of the
interest point descriptors. Unique descriptors are essential for avoiding false
matches, especially in applications with large databases. As stated before,
redundant interest points across scales may create similar descriptors. Interest
points that are only detected on very specific image features may also reduce
the distinctiveness of descriptors.

ROC curves showing the uniqueness of the descriptorsgenerated from 8 different detectors averaged over 8 datasets.

**
Applications:**

We test the usefulness of the detectors in image alignment using a standard RANSAC approach. It is assumed the images are related by an homography. We evaluate their performance on the 8 image sets described above that contain varying amounts of illumination changes, scaling and projective distortions. The percentage of all images pairs from each dataset with correctly computed homographies are shown int the table below. All of the of detectors perform well on the datasets Boat, Graffiti and Light, but there exists more variation on the newer datasets. Overall Edge Foci (53.0%) performs the best followed by Harris Laplace (50.9%) and Harris Affine (48.9%).

Next, we test our detector on the image retrieval task using the Oxford Building dataset. We use two approaches. In our first approach we use a Bag-of-words model with hierarchical K-means. Three levels are used with a branching factor of 80, resulting in 512,000 visual words. We use a stop list containing the 8,000 most commonly occurring words. A new vocabulary is built for each detector and matches are ranked based on the histogram intersection. Our second approach uses the same bag-of words model with the addition of spatial verification. Spatial verification is achieved using a three degree of freedom (position and scale) voting scheme between corresponding interest points. We compute the mean of the average precision scores for each detector across all 11 buildings in the dataset. The results for both methods are shown in the table below. Both Edge Foci and Harris Laplace achieve good results in both tests. It is interesting to notice that detectors with good localization (Edge Foci, DoG, MSER, etc.) get a larger performance boost from spatial verification than those with poor localization (Hessian, Harris).

C. Lawrence. Zitnick and Krishnan Ramnath

International Conference on Computer Vision (ICCV), 2011

[Thanks to Yong Jae Lee for the webpage template]