|
QP Recycler: Reuse dont recompute
Overview
People
The Idea
The stream of queries seen by a database system often exhibits significant redundancy, that is, many queries contain similar selections, joins, and aggregations. Consequently, the system may be wasting valuable resources by recomputing the same result multiple times. To speed up query processing, it may be better to save some results for later reuse. This is a very general idea that is already partly exploited in database systems. Variants of this idea differ mainly in the lifetime of the saved result, as summarized below.
Goal
The overall goal of the QP Recycler project is to speed up the processing of complex SQL queries by recognizing and exploiting commonality within and amongst queries. The primary focus is on multi-query optimization and semantic caching. All variants of the problem require the ability to quickly recognize commonality amongst SQL expressions. For instance, to make use of materialized views or cached query results during optimization, we must be able to quickly generate equivalent query expressions using a combination of logical expressions that correspond to saved results. The resulting expressions represent alternative ways of computing part of or the entire query, which the query optimizer can then evaluate in the normal way. If a query is computed from multiple saved results created at different times, we must also guarantee consistency (i.e. that the query result corresponds to a valid state of the database). Making use of saved results, however, is just one piece of a bigger puzzle. Equally important is determining which results to save and whether, and how, to maintain them when the underlying base tables are updated. Initial Focus
In the area of semantic caching, we are first concentrating on making efficient use of saved results. We are developing a fast algorithm to assemble equivalent expressions from other expressions that use saved results. It deals with expressions composed of selects, projects, joins and group-by. It is intended to handle large collections of saved results (tens of thousands of expressions as opposed to tens). In addition, our algorithm handles a larger class of expressions and considers computing expressions from multiple saved results (as opposed to only one). The algorithm is made possible by both new heuristics for performing the search, and taking advantage of schema knowledge, specifically, foreign and primary key information. In the area of multi-query optimization, our work is focused on three issues: (a) efficiently finding computations that can be shared (i.e. equivalent or subsumed expressions), (b) determining the optimal materialization strategy for these sharable expressions, (c) integrating multi-query optimization algorithms with traditional optimizer technology. |