Physically Independent Stream Merging

MSR-TR-2011-82 |

Several desired capabilities in a data stream management system (DSMS), such as query-plan switching and high availability, can be considerably simplified using a facility to merge equivalent data streams. One can logically view a data stream as a temporal table of events, each associated with a lifetime (time interval) over which the event contributes to output. In many applications, the “same” logical stream may present itself physically in multiple physical forms, for example, due to disorder arising during transmission or from combining multiple sources; and modifications or deletions of earlier events. Merging such streams correctly is challenging when the streams may differ physically in timing, order, and composition. This paper introduces a new stream operator called Logical Merge (LMerge) that takes multiple logically consistent streams as input and outputs a single stream that is compatible with all the inputs. LMerge can handle the dynamic attachment and detachment of input streams. We present a range of algorithms for LMerge that can exploit compile-time stream properties for efficiency. Experiments with StreamInsight, a commercial DSMS, show that LMerge is sometimes orders-of-magnitude more efficient than enforcing determinism on input streams, and that there is benefit to using specialized algorithms when stream variability is limited. We also show that LMerge and its extensions can provide performance benefits in several real-world applications.