Systems Research
Projects
  • MadLINQ: Large-Scale Distributed Matrix Computation for the Cloud
    The computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system.
  • MoonBox
    Efficient tools are indispensable in the battle against software bugs. In this project, we aims to improve the debugging productivity that targets different phases of an interactive and iterative debugging session.
  • PASS: Program Analysis for SCOPE Scripts
    PASS project is a continuing collaboration with the Cosmos team that aims to improve SCOPE script correctness and performance using program analysis techniques, following the inter-disciplinary research direction, among program language, system and database research.
  • Temporal graph storage and analysis of social data
    The explosion of user-generated data from online social networks stimulates the analysis that extracts deep insights from the data. As the data items exhibit rich connections (e.g., they can be connected by social relation, time, location, and topics), it is natural to study them in the form of a graph. Moreover, such a graph evolves over time, trending topics and social activities are constantly changing. We are building systems to enable the storage and analysis on a time evolving graph.
  • TimeStream: Large-Scale Real-Time Stream Processing in the Cloud
    Grape is a distributed system designed specifically for ``real-time'' continuous processing of ``big'' streaming data on a large cluster of commodity machines. The unique characteristics of this emerging application domain have led to a significantly different design from now ``traditional'' MapReduce style batch data processing. In particular, we advocate a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model.