The Revolution in Database Architecture

MSR-TR-2004-31 |

Extended abstract of keynote talk at ACM SIGMOD 2004 Paris, France - June 2004

Publication | Publication

Database system architectures are undergoing revolutionary changes. Most importantly, algorithms and data are being unified by integrating programming languages with the database system. This gives an extensible object-relational system where non-procedural relational operators manipulate object sets. Coupled with this, each DBMS is now a web service. This has huge impli-cations for how we structure applications. DBMSs are now object containers. Queues are the first objects to be added. These queues are the basis for transaction processing and workflow applica-tions. Future workflow systems are likely to be built on this core. Data cubes and online analytic processing are now baked into most DBMSs. Beyond that, DBMSs have a framework for data mining and machine learning algorithms. Decision trees, Bayes nets, clustering, and time series analysis are built in; new algo-rithms can be added. There is a rebirth of column stores for sparse tables and to optimize bandwidth. Text, temporal, and spatial data access methods, along with their probabilistic reasoning have been added to database systems. Allowing approximate and prob-abilistic answers is essential for many applications. Many believe that XML and xQuery will be the main data structure and access pattern. Database systems must accommodate that perspective. External data increasingly arrives as streams to be compared to historical data; so stream-processing operators are being added to the DBMS. Publish-subscribe systems invert the data-query ra-tios; incoming data is compared against millions of queries rather than queries searching millions of records. Meanwhile, disk and memory capacities are growing much faster than their bandwidth and latency, so the database systems increasingly use huge main memories and sequential disk access. These changes mandate a much more dynamic query optimization strategy – one that adapts to current conditions and selectivities rather than having a static plan. Intelligence is moving to the periphery of the network. Each disk and each sensor will be a competent database machine. Relational algebra is a convenient way to program these systems. Database systems are now expected to be self-managing, self-healing, and always-up. We researchers and developers have our work cut out for us in delivering all these features.