Zichao Qi, Yanghua Xiao, Bin Shao, and Haixun Wang
January 2013
The emergence of real life graphs with billions of nodes poses
significant challenges for managing and querying these graphs. One
of the fundamental queries submitted to graphs is the shortest distance
query. Online BFS (breadth-first search) and offline
pre-computing pairwise shortest distances are prohibitive in time or
space complexity for billion-node graphs. In this paper, we study
the feasibility of building distance oracles for billion-node
graphs. A distance oracle provides approximate answers to shortest
distance queries by using a pre-computed data structure for the
graph. Sketch-based distance oracles are good candidates because
they assign each vertex a sketch of bounded size, which means they
have linear space complexity. However, state-of-the-art sketch-based
distance oracles lack efficiency or accuracy when dealing with big
graphs. In this paper, we address the scalability and accuracy
issues by focusing on optimizing the three key factors that affect
the performance of distance oracles: landmark selection,
distributed BFS, and answer generation. We conduct
extensive experiments on both real networks and synthetic networks
to show that we can build distance oracles of affordable cost and
efficiently answer shortest distance queries even for billion-node
graphs.
| Type | TechReport |
| Number | MSR-TR-2013-13 |