TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems

PVLDB |

Publication

Computing distances among data points is an essential part
of many important algorithms in data analytics, graph analysis,
and other domains. In each of these domains, developers
have spent significant manual e↵ort optimizing algorithms,
often through novel applications of the triangle
equality, in order to minimize the number of distance computations
in the algorithms. In this work, we observe that
many algorithms across these domains can be generalized as
an instance of a generic distance-related abstraction. Based
on this abstraction, we derive seven principles for correctly
applying the triangular inequality to optimize distance-related
algorithms. Guided by the findings, we develop Triangular
OPtimizer (TOP), the first software framework that is able
to automatically produce optimized algorithms that either
matches or outperforms manually designed algorithms for
solving distance-related problems. TOP achieves up to 237x
speedups and 2.5X on average.