Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das, and Vivek Narasayya
The choice of database layout, i.e., how database objects such as tables and indexes are assigned to disk drives can significantly impact the I/O performance of the system. Today, DBAs typically rely on fully striping objects across all available disk drives as the basic mechanism for optimizing I/O performance. While full striping maximizes I/O parallelism, when query execution involves co-access of two or more large objects, e.g., a merge join of two tables, the above strategy may be suboptimal due to the increased number of random I/O accesses on each disk drive. In this paper, we propose a framework for automating the choice of database layout for a given database that also takes into account the effects of co-accessed objects in the workload faced by the system. We formulate the above as an optimization problem and present an efficient solution to the problem that judiciously takes into account the trade-off between I/O parallelism and random I/O accesses. Our experiments on Microsoft SQL Server show the superior I/O performance of our techniques compared to the traditional approach of fully striping each database object across all disk drives.
|Published in||19th International Conference on Data Engineering|
|Publisher||IEEE Computer Society|
Copyright © 2007 IEEE. Reprinted from IEEE Computer Society. This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to email@example.com. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.