INTRODUCTION TO QUERY OPTIMIZATION
This very remarkable man Commends a most practical plan: You can do what you want If you don’t think you can’t, So don’t think you can’t if you can. —Charles Inge
Consider a simple selection query asking for all reservations made by sailor Joe. As we saw in the previous chapter, there are many ways to evaluate even this simple query, each of which is superior in certain situations, and the DBMS must consider these alternatives and choose the one with the least estimated cost. Queries that consist of several operations have many more evaluation options, and ﬁnding a good plan represents a signiﬁcant challenge. A more detailed view of the query optimization and execution layer in the DBMS architecture presented in Section 1.8 is shown in Figure 13.1. Queries are parsed and then presented to a query optimizer, which is responsible for identifying an eﬃcient execution plan for evaluating the query. The optimizer generates alternative plans and chooses the plan with the least estimated cost. To estimate the cost of a plan, the optimizer uses information in the system catalogs. This chapter presents an overview of query optimization, some relevant background information, and a case study that illustrates and motivates query optimization. We discuss relational query optimizers in detail in Chapter 14. Section 13.1 lays the foundation for our discussion. It introduces query evaluation plans, which are composed of relational operators; considers alternative techniques for passing results between relational operators in a plan; and describes an iterator interface that makes it easy to combine code for individual relational operators into an executable plan. In Section 13.2, we describe the system catalogs for a relational DBMS. The catalogs contain the information needed by the optimizer to choose between alternate plans for a given query. Since the costs of alternative plans for a given query can vary by orders of magnitude, the choice of query evaluation plan can have a dramatic impact on execution time. We illustrate the diﬀerences in cost between alternative plans through a detailed motivating example in Section 13.3. 359
Query Query Parser Parsed query
Plan Generator Plan Cost Estimator
Evaluation plan Query Plan Evaluator
Query Parsing, Optimization, and Execution
We will consider a number of example queries using the following schema: Sailors(sid: integer, sname: string, rating: integer, age: real) Reserves(sid: integer, bid: integer, day: dates, rname: string) As in Chapter 12, we will assume that each tuple of Reserves is 40 bytes long, that a page can hold 100 Reserves tuples, and that we have 1,000 pages of such tuples. Similarly, we will assume that each tuple of Sailors is 50 bytes long, that a page can hold 80 Sailors tuples, and that we have 500 pages of such tuples.
13.1 OVERVIEW OF RELATIONAL QUERY OPTIMIZATION
The goal of a query optimizer is to ﬁnd a good evaluation plan for a given query. The space of plans considered by a typical relational query optimizer can be understood by recognizing that a query is essentially treated as a σ − π − × algebra expression, with the remaining operations (if any, in a given query) carried out on the result of the σ − π − × expression. Optimizing such a relational algebra expression involves two basic steps: Enumerating alternative plans for evaluating the expression; typically, an optimizer considers a subset of all possible plans because the number of possible plans is very large. Estimating the cost of each enumerated plan, and choosing the plan with the least estimated cost.
Introduction to Query Optimization
Commercial optimizers: Current RDBMS optimizers are complex pieces of software with many closely guarded details and typically represent 40 to 50 manyears of development eﬀort!
In this section we lay the foundation for...