- MadLINQ: Large-Scale Distributed Matrix Computation for the CloudThe computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system.
- MoonBoxEfficient tools are indispensable in the battle against software bugs. In this project, we aims to improve the debugging productivity that targets different phases of an interactive and iterative debugging session.
- PASS: Program Analysis for SCOPE ScriptsPASS project is a continuing collaboration with the Cosmos team that aims to improve SCOPE script correctness and performance using program analysis techniques, following the inter-disciplinary research direction, among program language, system and database research.
- Temporal graph storage and analysis of social dataThe explosion of user-generated data from online social networks stimulates the analysis that extracts deep insights from the data. As the data items exhibit rich connections (e.g., they can be connected by social relation, time, location, and topics), it is natural to study them in the form of a graph. Moreover, such a graph evolves over time, trending topics and social activities are constantly changing. We are building systems to enable the storage and analysis on a time evolving graph.
- TimeStream: Large-Scale Real-Time Stream Processing in the CloudTimeStream is a distributed system designed specifically for low-latency continuous processing of big streaming data on a large cluster of commodity machines. The unique characteristics of this emerging application domain have led to a significantly different design from the popular MapReduce-style batch data processing. In particular, we advocate a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model.