Jonathan Goldstein, John C. Platt, and Christopher J.C. Burges
This paper addresses the problem of quickly performing point queries against high-dimensional regions. Such queries are useful in the increasingly important problems of multimedia identification and retrieval, where different database entries have different metrics for similarity. While the database literature has focused on indexing for high-dimensional nearest neighbor and epsilon range queries, indexing for point queries against high-dimensional regions has not been addressed. We present an efficient indexing method for these queries, which relies on the combination of redundancy and bit vector indexing to achieve significant performance gains. We have implemented our approach in a real-world audio fingerprinting system, and have obtained a factor of 56 speed-up over linear scan. Furthermore, the well-known Hilbert bulk-loaded R-Trees, a technique capable of searching low-dimensional regions, are shown to be ineffective in our audio fingerprinting system, because of the inherently high-dimensional properties of the problem.