Relational Languages and Data Models for Continuous Queries on Sequences and Data Streams

  • Yan-Nei Law ,
  • Haixun Wang ,
  • Carlo Zaniolo

ACM Transactions on Database Systems (TODS) |

Most data stream management systems are based on extensions of the relational data model and query languages, but rigorous analyses of the problems and limitations of this approach, and how to overcome them, are still wanting. In this paper, we elucidate the interaction between stream-oriented extensions of the relational model and continuous query language constructs, and show that the resulting expressive power problems are even more serious for data streams than for databases. In particular, we study the loss of expressive power caused by the loss of blocking query operators, and characterize nonblocking queries as monotonic functions on the database. Thus, we introduce the notion of NB-completeness to assure that a query language is as suitable for continuous queries as it is for traditional database queries. We show that neither RA nor SQL are NB-complete on unordered sets of tuples, and the problem is even more serious when the data model is extended to support order—a sine-qua-non in data stream applications. The new limitations of SQL, compounded with well-known problems in applications such as sequence queries and data mining, motivate our proposal of extending the language with user-defined aggregates (UDAs). These can be natively coded in SQL, according to simple syntactic rules that set nonblocking aggregates apart from blocking ones. We first prove that SQL with UDAs is Turing complete. We then prove that SQL with monotonic UDAs and union operators can express all monotonic set functions computable by a Turing machine (NB-completeness) and finally extend this result to queries on sequences ordered by their timestamps. The proposed approach supports data stream models that are more sophisticated than appendonly relations, along with data mining queries, and other complex applications.