Building Systems that Query on Compressed Data

Web services today want to support sophisticated queries, with stringent interactivity (latency and/or throughput) constraints. Many recent studies have argued that in-memory query execution is one of the keys to achieving query interactivity. However, as web services scale to larger data sizes, executing queries in memory becomes increasingly challenging. As a result, existing systems fall short of supporting sophisticated interactive queries at scale.

In this talk, I will present Succinct, a distributed data store that supports functionality comparable to state-of-the-art NoSQL stores and yet, enables query interactivity for an order of magnitude larger data sizes than what is possible today (or, alternatively, up to two orders of magnitude faster queries at scale). Succinct achieves this by executing a wide range of queries — e.g., search, range, and even regular expressions — directly on compressed data. Succinct achieves scale by storing the input data in a compressed form, and interactivity by avoiding data scans and data decompression. I will also discuss how Succinct’s approach of executing queries on compressed data enables a new “lens” for exploring several classical systems problems from real-world production clusters — e.g., failure recovery, load spikes during transient failures, skewed workloads, etc. –, and leads to previously unachievable operating points in the system design space. Succinct is already being adopted in production clusters of several large-scale web services.

Date:
Speakers:
Rachit Agarwal
Affiliation:
UC Berkeley