Locality-sensitive hashing (LSH) is the basis of many algorithms that use a
probabilistic approach to find nearest neighbors. We describe an algorithm for
optimizing the parameters and use of LSH. Prior work ignores these issues or
suggests a search for the best parameters. We start with two histograms: one that
characterizes the distributions of distances to a point's nearest neighbors and
the second that characterizes the distance between a query and any point in the
data set. Given a desired performance level (the chance of finding the true
nearest neighbor) and a simple computational cost model, we return the LSH
parameters that allow an LSH index to meet the performance goal and have the
minimum computational cost. We can also use this analysis to connect LSH to
deterministic nearest-neighbor algorithms such as *k* *d* trees and
thus start to unify the two approaches.