Edge Foci Interest
Edge Foci Interest Points
Illustration (a) of the position of edge foci (red dots) relative to edges (blue line). Grey ellipses show the area of positive response for the orientation dependent filters. Peak aggregated filter responses for a small scale (b) and large scale (c).
In this paper,
we describe an interest point detector using edge foci. Unlike traditional
detectors that compute interest points directly from image intensities, we use
normalized intensity edges and their orientations. We hypothesize that detectors
based on the presence of oriented edges are more robust to non-linear lighting
variations and background clutter than intensity based techniques. Specifically,
we detect edge foci, which are points in the image that are roughly equidistant
from edges with orientations perpendicular to the point. The scale of the
interest point is defined by the distance between the edge foci and the edges.
We quantify the performance of our detector using the interest point’s
repeatability, uniformity of spatial distribution, and the uniqueness of the
resulting descriptors. Results are found using traditional datasets and new
datasets with challenging non-linear lighting variations and occlusions.
In this paper, we describe an interest point detector using edge foci. Unlike traditional detectors that compute interest points directly from image intensities, we use normalized intensity edges and their orientations. We hypothesize that detectors based on the presence of oriented edges are more robust to non-linear lighting variations and background clutter than intensity based techniques. Specifically, we detect edge foci, which are points in the image that are roughly equidistant from edges with orientations perpendicular to the point. The scale of the interest point is defined by the distance between the edge foci and the edges. We quantify the performance of our detector using the interest point’s repeatability, uniformity of spatial distribution, and the uniqueness of the resulting descriptors. Results are found using traditional datasets and new datasets with challenging non-linear lighting variations and occlusions.
Identifying local features is a critical component to many approaches in object recognition, object detection, image matching and 3D reconstruction. In each of these scenarios, a common approach is to use interest point detectors to estimate a reduced set of local image regions that are invariant to occlusion, orientation, illumination and viewpoint changes. The interest point operator defines these regions by their spatial locations, orientations, scales and possibly affine transformations. Descriptors are then computed from these image regions to find reliable image-to-image or image-to-model matches. It is desirable that a good interest point detector has the following three properties: (1) The interest points are repeatable, (2) the descriptors produced from them are unique and (3) they are well-distributed spatially and across scales.
Most of the interest point detection approaches perform a series of linear filtering operations on the image’s intensities to detect interest point positions. However, filtering intensities directly can result in reduced repeatability under non-linear lighting variations that commonly occur in real world scenarios. Furthermore, when detecting objects in a scene, changes in the background will also result in non-linear intensity variations along object boundaries, resulting in a similar reduction in repeatability.
We propose detecting interest points using edge foci. We define the set of edge focus points or edge foci, as the set of points that lie roughly equidistant to a set of edges with orientations perpendicular to the point. The detection of edge foci is computed from normalized edge magnitudes, and is not directly dependent on the image’s intensities or absolute gradient magnitudes. Compared to image intensities, we hypothesize that the presence of edges and their orientations is more robust to non-linear lighting variations and background clutter. Edge foci are detected by applying different filters perpendicular and parallel to an edge.The filter parallel to the edge determines the edge’s scale using a Laplacian of a Gaussian. The second filter blurs the response perpendicular to the edge centered on the predicted positions of the foci. Aggregating the responses from multiple edges results in peaks at edge foci.
Flow diagram of the detector: (a) input image, (b) normalized gradient, (c) normalized gradients separated into orientations, (d) responses after applying oriented filter, (e) the aggregated results and (f) detected interest point.
Experiments and Results
Filter response on four images (a) for the edge foci detector (b) and Laplacian detector (c).
We show the response of our detector in comparison with the Laplace detector on a set of toy examples in the figure above, to provide additional insight . Notice how the filter responses are more consistent for our edge based descriptor as compared to the intensity based Laplacian detector. The edge foci detector finds interest points at the correct scale and position on all four circles, where the Laplacian fails on the right two.
Detection results for different style fonts. Edge Foci is the most repeatable.
We show experimental results to illustrate the performance of edge foci interest points based on three different metrics: First, we provide an entropy measure to study the distribution both spatially and across scales of the interest points. Second, we score the interest points’ repeatability, i.e. whether corresponding regions are chosen between images. Finally, we measure the uniqueness of the descriptors computed by the interest points to estimate the amount of ambiguity present during matching. We also evaluate the detectors on image alignment and retrieval tasks.
We compare the performance of our detector against some of the most commonly used detectors such as the Harris, Hessian, Harris/Hessian Laplace, MSER and DoG detectors on a set of new datasets that capture non-linear illumination variations and changes in background clutter. Homographies are computed using ten or more hand labeled points to provide correspondences between pairs of images. We assume the scenes are either planar or taken at a far distance so correspondences can be well approximated by a homography. Figure above shows two example images from the 8 datasets used in this paper.
Entropy: We measure the distribution of the interest points in scale space based on their entropy. Intuitively, the positions of well distributed detectors should have a higher entropy than detectors with overlapping interest points. We compute entropy by discretizing positions and scales.
Entropy of interest point detectors across various datasets.
Repeatability: The repeatability criterion measures the percentage of interest points that are detected at the same relative positions and scales across images. Considered in isolation, the repeatability score can be biased towards detectors that find overlapping interest points. That is, poor localization can be mitigated by the detection of redundant interest points. In practice this is undesirable for two reasons: First, redundant interest points require more storage, and increased computation for matching. Second, the descriptors corresponding to the interest points will not be unique, increasing the difficultly of matching and nearest neighbor techniques.We modify the traditional measure of repeatability to additionally penalize overlapping detections.
Graphs showing the repeatability with penalized overlap for 8 datasets using 7 different detectors.
Uniqueness: While the repeatability measure describes how often we can match the same regions in two images, it does not measure the uniqueness of the interest point descriptors. Unique descriptors are essential for avoiding false matches, especially in applications with large databases. As stated before, redundant interest points across scales may create similar descriptors. Interest points that are only detected on very specific image features may also reduce the distinctiveness of descriptors.
We test the usefulness of the detectors in image alignment using a standard RANSAC approach. It is assumed the images are related by an homography. We evaluate their performance on the 8 image sets described above that contain varying amounts of illumination changes, scaling and projective distortions. The percentage of all images pairs from each dataset with correctly computed homographies are shown int the table below. All of the of detectors perform well on the datasets Boat, Graffiti and Light, but there exists more variation on the newer datasets. Overall Edge Foci (53.0%) performs the best followed by Harris Laplace (50.9%) and Harris Affine (48.9%).
Next, we test our detector on the image retrieval task using the Oxford Building dataset. We use two approaches. In our first approach we use a Bag-of-words model with hierarchical K-means. Three levels are used with a branching factor of 80, resulting in 512,000 visual words. We use a stop list containing the 8,000 most commonly occurring words. A new vocabulary is built for each detector and matches are ranked based on the histogram intersection. Our second approach uses the same bag-of words model with the addition of spatial verification. Spatial verification is achieved using a three degree of freedom (position and scale) voting scheme between corresponding interest points. We compute the mean of the average precision scores for each detector across all 11 buildings in the dataset. The results for both methods are shown in the table below. Both Edge Foci and Harris Laplace achieve good results in both tests. It is interesting to notice that detectors with good localization (Edge Foci, DoG, MSER, etc.) get a larger performance boost from spatial verification than those with poor localization (Hessian, Harris).
International Conference on Computer Vision (ICCV), 2011
[Thanks to Yong Jae Lee for the webpage template]