*
Quick Links|Home|Worldwide
Microsoft*
Search for


QP Recycler: Reuse dont recompute

People

Paul Larson

Jonathan Goldstein

The Idea

The stream of queries seen by a database system often exhibits significant redundancy, that is, many queries contain similar selections, joins, and aggregations. Consequently, the system may be wasting valuable resources by recomputing the same result multiple times. To speed up query processing, it may be better to save some results for later reuse.

This is a very general idea that is already partly exploited in database systems. Variants of this idea differ mainly in the lifetime of the saved result, as summarized below. 

  1. Lifetime: single query. This variant focuses on finding and exploiting similar subexpressions within a single query. 
  2. Lifetime: set of queries. This variant generalizes the problem to a set of queries. It is usually referred to as multi-query optimization. 
  3. Lifetime: undetermined. Results can be cached - either locally on the server or on clients - for an undetermined length of time. This variant has been studied under the heading of semantic caching.
  4. Lifetime: forever. Traditional indexes and materialized views are precomputed, saved results that are, in principle, kept forever. This means that they must be updated when the underlying data changes.
Goal

The overall goal of the QP Recycler project is to speed up the processing of complex SQL queries by recognizing and exploiting commonality within and amongst queries. The primary focus is on multi-query optimization and semantic caching.

All variants of the problem require the ability to quickly recognize commonality amongst SQL expressions. For instance, to make use of materialized views or cached query results during optimization, we must be able to quickly generate equivalent query expressions using a combination of logical expressions that correspond to saved results. The resulting expressions represent alternative ways of computing part of or the entire query, which the query optimizer can then evaluate in the normal way. If a query is computed from multiple saved results created at different times, we must also guarantee consistency (i.e. that the query result corresponds to a valid state of the database). Making use of saved results, however, is just one piece of a bigger puzzle. Equally important is determining which results to save and whether, and how, to maintain them when the underlying base tables are updated. 

Initial Focus

In the area of semantic caching, we are first concentrating on making efficient use of saved results. We are developing a fast algorithm to assemble equivalent expressions from other expressions that use saved results. It deals with expressions composed of selects, projects, joins and group-by. It is intended to handle large collections of saved results (tens of thousands of expressions as opposed to tens). In addition, our algorithm handles a larger class of expressions and considers computing expressions from multiple saved results (as opposed to only one). The algorithm is made possible by both new heuristics for performing the search, and taking advantage of schema knowledge, specifically, foreign and primary key information.  In the area of multi-query optimization, our work is focused on three issues: (a) efficiently finding computations that can be shared (i.e. equivalent or subsumed expressions), (b) determining the optimal materialization strategy for these sharable expressions, (c) integrating multi-query optimization algorithms with traditional optimizer technology.


©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement