Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda, and zheng zhang
For many big data analytics workloads, approximate results suffice. This begs the question, whether and how the underlying system architecture can take advantage of such relaxations, thereby lifting constraints inherent in today’s architectures. This position paper explores one of the possible directions. Impression Store is a distributed storage system with the abstraction of big data vectors. It aggregates updates internally and responds to the retrieval of top-K high-value entries. With proper extension, Impression Store supports various aggregations, top-K queries, outlier and major mode detection. While restricted in scope, such queries represent a substantial and important portion of many production workloads. In return, the system has unparalleled scalability; any node in the system can process any query, both reads and updates. The key technique we leverage is compressive sensing, a technique that substantially reduces the amount of active memory state, IO, and traffic volume needed to achieve such scalability.
|Published in||HotCloud 2014|
|Publisher||USENIX – Advanced Computing Systems Association|