Database management systems provide functionality that is central to developing business applications. Therefore, database management systems are increasingly being used as an important component in applications. Yet, the problem of tuning database management systems for achieving required performance is significant, and results in high total cost of ownership (TCO). The goal of our research in the AutoAdmin project is to make database systems self-tuning and self-administering. We achieve this by enabling databases to track the usage of their systems and to gracefully adapt to application requirements. Thus, instead of applications having to track and tune databases, databases actively auto-tunes itself to be responsive to application needs. Our research has led to novel self-tuning components being included in Microsoft SQL Server.
Physical Database Design
One of the primary areas of focus in the AutoAdmin project has been the challenge of automating physical database design. Physical database design is the problem of identifying the appropriate physical design structures (e.g. indexes) for a given database and workload. Judicious choice of the physical database design is important since a well-chosen physical design can often improve workload performance by orders of magnitude. It is also a very challenging task that typically requires skilled DBAs who can reason about the trade-offs of creating or dropping physical design structures on the workload. Our research on this problem led to the development of the Index Tuning Wizard that shipped as part of Microsoft SQL Server 7.0 in 1998. This work was done in close collaboration with the Microsoft SQL Server product group. The Index Tuning Wizard was the first tool of its kind in any commercial DBMS. Other prominent DBMS vendors followed suit subsequently with similar physical design tuning tools. The architecture of this tool and the underlying algorithms are available in the VLDB 1997 paper "An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server". This paper won the VLDB 10-Year Best Paper Award in 2007. One of the key aspects of the architecture is an extension to the query optimizer to support a “what-if” interface: given a query and a set of hypothetical physical design structures (e.g. indexes that do not currently exist), return the plan that the optimizer would have picked if that set of physical design structures were actually materialized. This interface, described in a SIGMOD 1998 paper “AutoAdmin ‘What-If’ Index Analysis Utility”, allows physical design tools to consider a large space of alternative physical designs without actually materializing the physical design structures. Similar “what-if” interfaces were subsequently adopted by other commercial DBMSs as well.
In Microsoft SQL Server 2000, the Index Tuning Wizard was extended to also recommend materialized views. The VLDB 2000 paper "Automated Selection of Materialized Views and Indexes for SQL Databases" describes some of the key technical ideas for dealing with the large space of alternatives that arises with the inclusion of materialized views. In the Microsoft SQL Server 2005 release, the tool was changed from being a wizard to a full blown application called the Database Engine Tuning Advisor (DTA). DTA has been used effectively by many real-world customers of Microsoft SQL Server. DTA’s recommendations have resulted in order of magnitude speedups for complex workloads on very large databases, and typically result in comparable performance to hand-tuned physical designs chosen by DBAs. Further information about DTA’s quality of its recommendations and scalability for large and complex real-world workloads can be found in the VLDB 2004 paper "Database Tuning Advisor for Microsoft SQL Server 2005". We have also published details of DTA’s algorithms, including techniques for handling physical design structures such as partitioning, in the SIGMOD 2004 paper "Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design". Finally, in many real-world scenarios, physical design tuning needs to capture rich constraints, e.g., no query in the workload should degrade by more than 10% with respect to the current configuration. Our work on constrained physical design tuning won the Best Paper Award at VLDB 2008.
We have also explored other modes of physical design tuning. We studied how to enable a low overhead physical design “alerter” that could help DBAs answer the question: “Is it worthwhile to tune a database now”? Another interesting problem arises when there is no DBA available (e.g. an embedded database or a small business). In such scenarios, a low touch continuous index tuning approach may become important. We have explored solutions for such scenarios as well in the two papers: “To Tune or not to Tune? A Lightweight Physical Design Alerter” in VLDB 2006 and “An Online Approach to Physical Design Tuning” in ICDE 2007.
Our initial work on physical database design was awarded the VLDB 10-year Best Paper Award - in this context, we wrote an overview article on the progress made in physical design tuning as well as other facets of self-tuning database systems.
In addition to physical design structures, DTA also recommends appropriate statistics (e.g. histograms on columns of a base table) for the workload. Statistics are crucial for workload performance since the quality of the plan generated by the query optimizer depends heavily on the available statistics. Our work on techniques for determining appropriate statistics for a workload won the Best Paper Award in ICDE 2001. We have also worked on the problem of exploiting statistics on views during query optimization (SIGMOD 2002, ICDE 2003). Using statistics on views can potentially improve the plan quality significantly, since the view can be defined on the result of a query expression, thereby eliminating errors due to assumptions that the optimizer otherwise would need to make.
Monitoring and Execution Feedback
An important requirement for a self-tuning database system is the ability to efficiently monitor the database and identify potential performance problems. As part of the AutoAdmin project we are investigating some of these issues. We developed a SQL Continuous Monitoring engine (SQLCM) in which DBAs can declaratively specify monitoring tasks that they wish to perform and have the engine execute these tasks efficiently at low overhead and with a small memory footprint.
A functionality that is of great interest to DBAs is estimating the progress of long-running SQL queries which are common in decision support systems. Accurate progress estimates can allow DBAs to make judicious decisions on whether to allow a long-running query that is consuming significant resources to complete or to kill it. We defined the model of progress for a SQL query, developed efficient progress estimators and analytically quantified the inherent hardness of the problem.
The performance of a query can heavily depend on the quality of the execution plan chosen by the query optimizer. Often, many key parameters used by the optimizer cannot be reliably estimated up-front when the plan is chosen. Thus, we have studied the problem of how feedback from query execution can be used to improve the choice of execution plan of the query optimizer. First, we proposed the concept of Self-Tuning Histograms (SIGMOD 1999, SIGMOD 2001) where the statistics (single and multi-dimensional histograms) are built using execution feedback, i.e. cardinalities observed during query execution. There has subsequently been much follow-on research work in the database community on this topic.
Second, we showed that there are many simple cases where existing mechanisms for obtaining execution feedback are limiting, i.e. they do not help improve the execution plan, no matter how many times the query is executed. We therefore studied how to augment execution feedback mechanisms to overcome some of these limitations while keeping the monitoring overheads at query execution time low. For example, using an alternate plan whose cost is not much higher than the current plan, but whose execution will result in obtaining accurate cardinalities for important expressions can yield significant improvements in overall performance.
We have also recently worked on other problems in the context of the AutoAdmin project:
- We developed PHints, a flexible query hinting mechanism, that allows DBAs and application developers fine-grained control in generating an execution plan for a query.
- In many database applications, performance problems can often be traced to poor application design, e.g. dynamic SQL generated by application code. We have also investigated techniques for statically analyzing and dynamically profiling database application code to help identify and potentially fix performance problems.
We are studying novel aspects of the physical database design problem:
- Data compression can be important in data warehousing to improve I/O performance and reduce storage costs. We are studying how the physical database design problem is affected by the availability of compression methods in databases such as null suppression, dictionary compression and run-length encoding (RLE).
- Index fragmentation can be a major contributor to I/O performance degradation in databases. We are studying the problem of automatic index defragmentation to help identify and overcome the impact on index fragmentation on workload I/O performance.
A broad area that we are focusing on is investigating how machine learning techniques can be applied effectively to challenging problems in database monitoring and performance tuning, thereby leveraging historical information from query execution more deeply.
If you have questions about this project, please contact the Autoadmin research team (firstname.lastname@example.org).
- Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri, Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques, in 38th International Conference on Very Large Databases, Very Large Data Bases Endowment Inc., 28 August 2012
- Nicolas Bruno, Surajit Chaudhuri, Arnd Christian König, Vivek Narasayya, Ravi Ramamurthy, and Manoj Syamala, AutoAdmin Project at Microsoft Research: Lessons Learned, in Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, IEEE Computer Society, December 2011
- surajit chaudhuri, Hongrae Lee, and vivek narasayya, Variance Aware Optimization of Parameterized Queries, in ACM SIGMOD, Association for Computing Machinery, Inc., 7 June 2010
- Stratos Idreos, Raghav Kaushik, Vivek Narasayya, and Ravi Ramamurthy, Estimating the Compression Fraction of an Index Using Sampling, in International Conference on Data Engineering (ICDE), IEEE, 1 March 2010
- Vivek Narasayya and Manoj Syamala, Workload Driven Index Defragmentation, in International Conference on Data Engineering (ICDE), IEEE, 1 March 2010
- Arjun Dasgupta, Vivek Narasayya, and Manoj Syamala, A Static Analysis Framework for Database Applications, in International Conference on Data Engineering (ICDE), IEEE, April 2009
- Nicolas Bruno, Teaching an Old Elephant New Tricks, in Conference on Innovative Data Systems Research (CIDR), Very Large Data Bases Endowment Inc., 2009
- Nicolas Bruno, Surajit Chaudhuri, and Ravishankar Ramamurthy, Power Hints for Query Optimization, in Proceedings of the International Conference on Data Engineering (ICDE), 2009
- Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy, A Pay-As-You-Go Framework for Query Execution Feedback, in VLDB, Very Large Data Bases Endowment Inc., August 2008
- Nicolas Bruno and Surajit Chaudhuri, Constrained Physical Design Tuning, in Proceedings of the International Conference on Very Large Databases (VLDB), 2008
- Surajit Chaudhuri, Vivek Narasayya, and Manoj Syamala, Bridging the Application and DBMS Profiling Divide for Database Application Developers, in VLDB, Very Large Data Bases Endowment Inc., September 2007
- Surajit Chaudhuri and Vivek Narasayya, Self-Tuning Database Systems: A Decade of Progress., in VLDB, Very Large Data Bases Endowment Inc., September 2007
- Nicolas Bruno and Surajit Chaudhuri, An Online Approach to Physical Design Tuning, in Proceedings of the International Conference on Data Engineering (ICDE), 2007
- Sanjay Agrawal, Eric Chu, and Vivek Narasayya, Automating Physical Database Design: Workload as a Sequence, in SIGMOD, Association for Computing Machinery, Inc., June 2006
- Sanjay Agrawal, Nicolas Bruno, Surajit Chaudhuri, and Vivek Narasayya, AutoAdmin: Self-Tuning Database Systems Technology, in Data Engineering Bulletin, IEEE Computer Society, 2006
- Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas, Generating Queries with Cardinality Constraints for DBMS Testing, in Transactions on Knowledge and Data Engineering, IEEE Computer Society, 2006
- Arnd Christian König and Shubha U. Nabar, Scalable Exploration of Physical Database Design, in 22nd International Conference on Data Engineering , IEEE Computer Society, 2006
- Zhimin Chen and Vivek Narasayya, Efficient Computation of Multiple Group-By Queries, in SIGMOD, Association for Computing Machinery, Inc., June 2005
- Nicolas Bruno and Surajit Chaudhuri, Automatic Physical Database Tuning: A Relaxation-based Approach, in Proceedings of the ACM International Conference on Management of Data (SIGMOD), Association for Computing Machinery, Inc., 2005
- Nicolas Bruno and Surajit Chaudhuri, Flexible Database Generators, Very Large Data Bases Endowment Inc., 2005
- Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollar, Arun Marathe, Vivek Narasayya, and Manoj Syamala, Database Tuning Advisor for Microsoft SQL Server 2005, in VLDB, Very Large Data Bases Endowment Inc., August 2004
- Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy, Estimating Progress of Execution for SQL Queries, in SIGMOD, Association for Computing Machinery, Inc., June 2004
- Sanjay Agrawal, Vivek Narasayya, and Beverly Yang, Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design, in SIGMOD, Association for Computing Machinery, Inc., June 2004
- Surajit Chaudhuri, Arnd Christian König, and Vivek Narasayya, SQLCM: A Contiuous Monitoring Framework for Relational Database Engines, in 20th International Conference on Data Engineering, Institute of Electrical and Electronics Engineers, Inc., March 2004
- Nicolas Bruno and Surajit Chaudhuri, Conditional Selectivity for Statistics on Query Expressions, in SIGMOD, Association for Computing Machinery, Inc., 2004
- Surajit Chaudhuri, Prasanna Ganesan, and Vivek Narasayya, Primitives for Workload Summarization and Implications for SQL , in VLDB, Very Large Data Bases Endowment Inc., September 2003
- Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das, and Vivek Narasayya, Automating Layout of Relational Databases, in 19th International Conference on Data Engineering, IEEE Computer Society, 2003
- Surajit Chaudhuri, Vivek Narasayya, and Sunita Sarawagi, Efficient Evaluation of Queries with Mining Predicates, in 18th International Conference on Data Engineering, IEEE Computer Society, March 2002
- Nicolas Bruno, Surajit Chaudhuri, and Luis Gravano, STHoles: A Multidimensional Workload-Aware Histogram, no. MSR-TR-2001-36, March 2001
- Surajit Chaudhuri, Vivek Narasayya, and Sanjay Agrawal, Automated Selection of Materialized Views and Indexes for SQL Databases., in VLDB, Very Large Data Bases Endowment Inc., September 2000
- Surajit Chaudhuri and Vivek Narasayya, Automating Statistics Management for Query Optimizers, in 16th International Conference on Data Engineering, IEEE Computer Society, March 2000
- Surajit Chaudhuri and Vivek Narasayya, Index Merging, in 15th International Conference on Data Engineering, IEEE Computer Society, April 1999
- Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya, Random Sampling for Histogram Construction: How Much is Enough?, in SIGMOD, Association for Computing Machinery, Inc., June 1998
- Surajit Chaudhuri and Vivek Narasayya, AutoAdmin "What-If" Index Analysis Utility, in SIGMOD, Association for Computing Machinery, Inc., June 1998
- Surajit Chaudhuri, Usama Fayyad, and Goetz Graefe, On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases, American Association for Artificial Intelligence , January 1998
- Surajit Chaudhuri, Overview of Query Optimization in Relational Systems, Association for Computing Machinery, Inc., January 1998
- Surajit Chaudhuri and Vivek Narasayya, An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server, in VLDB, Very Large Data Bases Endowment Inc., August 1997
- Surajit Chaudhuri and Kyuseok Shim, Optimization of Queries with User-defined Predicates, no. MSR-TR-97-03, February 1997
- Surajit Chaudhuri and Umeshwar Dayal, An Overview of Data Warehousing and OLAP Technology, Association for Computing Machinery, Inc., January 1997