RASP: Large-Scale Graph Traversal with SSD Prefetching
Eiko Yoneki (University of Cambridge), Karthik Nilakant (University of Cambridge), Valentin Dalibard (University of Cambridge), Amitabha Roy (EPFL)
Mining large graphs has now become an important aspect of multiple diverse applications and a number of computer systems have been proposed to efficiently execute graph algorithms. Recent interest in this area has led to the construction of single machine graph computation systems that use solid state drives (SSDs) to store the graph. This approach reduces the cost and simplifies the implementation of graph algorithms, making computations on large graphs available to the average user. However, SSDs are slower than main memory, and making full use of their bandwidth is crucial for executing graph algorithms in a reasonable amount of time. We present RASP (the (R)un(A)head (S)SD(P)refetcher) for graph algorithms that parallelises requests to derive maximum throughput from SSDs. RASP combines a judicious distribution of graph state between main memory and SSDs with an innovative run-ahead algorithm to prefetch needed data in parallel. This is in contrast to existing approaches that depend on multi-threading the graph algorithms to saturate available bandwidth. Our experiments on graph algorithms using random access
show that RASP not only is capable of maximising the throughput from SSDs but is also able to almost hide the effect of I/O latency. The improvements in runtime for graph algorithms is up to 14 X when compared to a single threaded baseline. When compared to sophisticated multi-threaded implementations, RASP performs up to 80% faster without the program complexity and the programmer effort needed for multithreaded graph algorithms.