Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C. Platt, James F. Terwilliger, and John Wernsing
This technical report introduces Trill – a fast incremental query engine for big data analytics. Trill (which stands for a trillion events per day) is based on a temporal data and query model that enables it to handle a wide range of analytics in diverse settings: real-time streaming queries, offline temporal queries, relational queries, and interactive (progressive) queries with early results over datasets pre-loaded into main memory or streamed from storage. What sets Trill apart and makes it practical for use as a unified analytics engine is its high performance across the board: for streaming data, Trill’s throughput is 2-4 orders of magnitude higher than comparable streaming engines. Further, for the case of offline relational (non-temporal) queries, Trill’s performance is comparable to a commercial columnar database query processor. Trill is a high-level language tool that supports arbitrary (programming language) data-types and libraries, unlike traditional DBMSs that restrict the type system and use their own native memory. Trill also supports a no-scheduler mode where work occurs on the thread feeding data to it; this mode is ideal for embedding within scale-out frameworks such as Orleans and REEF. Further, Trill handles strings (common in big data analytics) efficiently, and supports fast stream serialization at speeds that are 10x higher than current solutions such as Avro. This technical report describes Trill’s novel physical data model and system architecture that enables it to achieve these levels of performance. Extensive experiments demonstrate the performance of Trill in various settings, and its utility for fast data analytics across the spectrum.