|
CEDR: Complex Event Detection and Response
Mission Statement:
CEDR strives to be the world's first general purpose event/stream processing system. More specifically,
we bring together the worlds of event processing and streaming.
Event detection systems (e.g. pub/sub),
typically assume extremely high data rates, are frequently distributed, and have either simple
filters (stateless pub/sub), or pattern oriented query languages (stateful). Streaming systems, while still
assuming very high data rates, are not typically distributed, and tend to focus on windowed aggregation (stateful),
joins (stateful), and database interoperability. While both of these approaches have their merits,
they each suffer for lack of what the other offers.
In addition, existing systems, when faced with stream
imperfections caused by out of order delivery and system overload, are hardcoded to make specific
tradeoffs in terms of throughput, latency, memory, and correctness. Rather than choose a particular set of
tradeoffs, We intend to support the full range of options within CEDR.
Microsoft Research Contributors:
Jonathan Goldstein
Roger Barga
External Contributors (Most recent first):
Mingsheng Hong (Intern, , Summer 2007, Summer 2006, Cornell)
Mirek Riedewald (Visiting Researcher, Fall 2006, Cornell)
Mohamed Ali (Intern, Summer 2006, Perdue, Joining SQL Server in Summer 2007)
Hillary Caituiro-Monge (Intern, Summer 2005, UC Santa Barbara)
Approach:
Such a general purpose system is made possible
by the formal temporal foundation of CEDR, which gives us a clean and powerful platform which we
use to carefully design system semantics. Unlike other systems, which reason about events from the
standpoint of when they arrive at an event processing system, CEDR uses two notions of time. The
first notion of time is called valid time, and corresponds to when the event was valid
from an application's point of view. The other notion of time, called CEDR time, corresponds to the
time at which a CEDR system becomes aware of an event for processing. The query semantics of CEDR are
dependent only on valid time. CEDR time is only used to discuss the manner in which CEDR operators
respond to out of order delivery.
Note that this is analogous to the relational model and transactions. Relational operators,
like join, are described independently of transactions and their commit times. This provides a
clean seperation in relational systems between operator semantics and transactions. By using two
notions of time, we have similarly seperated the temporal information which is semantically interesting
from a query's point of view, and the temporal information which is an artifact of the event delivery
system. It is also this seperation of temporal concerns which allows us to respond in a
uniquely flexible manner to stream imperfections caused by out of order event arrival and system
overload. This bitemporal approach also allows us to express retractions, which are critically
important for achieving our goals.
Papers:
Phase II system - above mission statement & approach:
The Phase II CEDR vision paper(CIDR 2007)(pdf)
Phase I system - rich, stateful event processing with NFAs:
Handling Clock Skew(DEBS 06)(pdf)
Phase I query language(EDBT 2006 workshop)(pdf)
Presentations (Phase II):
The CIDR talk(ppt)
|