Toward a Distance Oracle for Billion-Node Graphs

The emergence of real life graphs with billions of nodes poses

significant challenges for managing and querying these graphs. One

of the fundamental queries submitted to graphs is the shortest distance

query. Online BFS (breadth-first search) and offline

pre-computing pairwise shortest distances are prohibitive in time or

space complexity for billion-node graphs. In this paper, we study

the feasibility of building distance oracles for billion-node

graphs. A distance oracle provides approximate answers to shortest

distance queries by using a pre-computed data structure for the

graph. Sketch-based distance oracles are good candidates because

they assign each vertex a sketch of bounded size, which means they

have linear space complexity. However, state-of-the-art sketch-based

distance oracles lack efficiency or accuracy when dealing with big

graphs. In this paper, we address the scalability and accuracy

issues by focusing on optimizing the three key factors that affect

the performance of distance oracles: landmark selection,

distributed BFS, and answer generation. We conduct

extensive experiments on both real networks and synthetic networks

to show that we can build distance oracles of affordable cost and

efficiently answer shortest distance queries even for billion-node

graphs.

Details

TypeTechReport
NumberMSR-TR-2013-13
> Publications > Toward a Distance Oracle for Billion-Node Graphs