Infrastructure for real-time analytics on modern cluster networks
Aleksandar Dragojevic, Dushyanth Narayanan, Orion Hodson, Miguel Castro
Two hardware trends present a great opportunity for building efficient infrastructure for real-time analytics: (1) servers today have tens to hundreds of gigabytes of RAM making it possible to keep large data sets in RAM of a moderately-sized cluster of servers and (2) modern cluster networks, such as InfiniBand and RoCE, support access to memory of remote servers with just a few micro-seconds of latency. By keeping the whole data set in main memory of a cluster and using direct remote memory access (RDMA) primitives of the network it is possible to support tens to hundreds of millions of point lookups in a terabyte-scale key-value store on a cluster of just several tens of servers.
We present Farm, a platform for building infrastructure for supporting in-memory applications that efficiently use modern cluster networks. Farm exposes the memory of all servers in a cluster as a single, fault-tolerant shared memory; it supports atomic reads and writes of small user-defined objects, efficient short transactions for easier synchronization, and exposes the location of objects to support locality-aware optimizations. We describe how to use Farm to build efficient and scalable B-trees and hashtables, which can be used as key-value stores or building blocks for more complex systems.