Programmer's Life: Optimizer Diagnostics and Tuning

< Day Day Up >

Optimizer Diagnostics and Tuning

MySQL offers an array of diagnostic information from the optimizer, as well as commands that you can use to help tune and guide this vital engine component. Most database administrators and developers will want to rush right in and start using the EXPLAIN command. However, EXPLAIN does not exist in a vacuum: There are complex interrelationships with other optimizer management commands, such as ANALYZE TABLE and OPTIMIZE TABLE. The following sections look at these first and then turn the attention to EXPLAIN.

The `ANALYZE TABLE` Command

The optimizer relies on accurate, up-to-date information to make its decisions. Statistics about indexes, including their keys, distribution, and structure, are critical factors in determining the most optimal results. For the InnoDB, MyISAM, and BDB engines, MySQL offers the ANALYZE TABLE command to calculate and store this information:



mysql> ANALYZE TABLE customer_master;

+--------------------------+---------+----------+----------+

| Table                    | Op      | Msg_type | Msg_text |

+--------------------------+---------+----------+----------+

| high_hat.customer_master | analyze | status   | OK       |

+--------------------------+---------+----------+----------+

1 row in set (0.47 sec)

It's a good idea to run this command periodically, especially for those tables subject to frequent updates that involve indexed columns.

To help you in choosing when to run the command, note that the engine places a read lock on the table for the duration of the study. However, if there are no changes to the underlying table, MySQL simply skips the analysis.

The `OPTIMIZE TABLE` Command

The storage advantages of variable-length rows were discussed earlier in the book. At that time, fragmentation was also cited as a potential performance risk. Fortunately, MySQL's OPTIMIZE TABLE command can defragment these tables, thereby restoring them to a more efficient structure while sorting any necessary indexes at the same time:



mysql> OPTIMIZE TABLE customer_master;

+--------------------------+----------+----------+----------+

| Table                    | Op       | Msg_type | Msg_text |

+--------------------------+----------+----------+----------+

| high_hat.customer_master | optimize | status   | OK       |

+--------------------------+----------+----------+----------+

1 row in set (3 min 51.78 sec)

Note that like its cousin ANALYZE TABLE, this command locks a table for the duration of processing. However, because it often restructures the table, OPTIMIZE TABLE generally takes much longer to complete, so be careful when selecting your launch time.

Disk management best practices are reviewed in more detail later in the book, especially in Chapter 13, "Improving Disk Speed." For now, take a look at a specific example of how OPTIMIZE TABLE can affect the disk usage and index structure of a table. For this example, we created a sample table as follows:



CREATE TABLE optimize_example 

(

    col1 INT AUTO_INCREMENT PRIMARY KEY,

    col2 VARCHAR(30),

    col3 VARCHAR(100),

    col4 VARCHAR(255)

) ENGINE = INNODB;

We then loaded one million rows of random data into this table, followed by creating an index on col2:



CREATE INDEX opt_ix1 ON optimize_example(col2);

After building the index, MySQL reported this table's size at 215.8MB, and the indexes consumed a further 37.6MB.

Next, to simulate significant changes to the quantity of data in the table and index, we deleted every fifth row:



DELETE FROM optimize_example WHERE mod(col1,5) = 0;

After these large-scale deletions, MySQL continues to report this table's size at 215.8MB and the indexes at 37.6MB.

We then loaded 250,000 more rows of random data. The reported size is now 218.8MB of data and 35.6MB of index.

At this point, we ran the OPTIMIZE TABLE command to defragment the table and re-sort its indexes:



mysql> OPTIMIZE TABLE optimize_example;

+---------------------------+----------+----------+----------+

| Table                     | Op       | Msg_type | Msg_text |

+---------------------------+----------+----------+----------+

| high_hat.optimize_example | optimize | status   | OK       |

+---------------------------+----------+----------+----------+

1 row in set (4 min 56.69 sec)

After the command ran, the sample table had been compressed to 175.7MB of data and 33.6MB of index: a significant size reduction and defragmentation.

The `EXPLAIN` Command

Database designers and developers use MySQL's EXPLAIN in one of two ways:

To get a list of columns for the table. This is the same as running the DESCRIBE command, as shown in Figure 6.1.
Figure 6.1. Viewing EXPLAIN results for a table.

[View full size image]
To understand how MySQL processes a query against one or more tables. This usage is much more interesting from an optimization perspective. Beginning with version 5.0.1, MySQL now offers a status variable to help you compare the costs of different query plans. Use the last_query_cost indicator when you are trying to decide between two or more approaches to write the same query.

The first question about EXPLAIN is simple: When should you use it? Fortunately, the answer is simple as well. You should use this command whenever you are designing queries against a large and/or complex database, because you may elect to use the results to alter your indexing strategy or even override the optimizer's query plan.

Next, how do you launch EXPLAIN? Whether you're using the command-line interface or the MySQL Query Browser, simply prefix your SQL SELECT statement with EXPLAIN. The optimizer then returns a detailed report on the suggested query plan. Let's review these reports in more detail.

Before beginning, use the following tables and related indexes throughout the balance of this chapter to help illustrate our examples:



CREATE TABLE customer_master 

(

    customer_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,

    ff_number CHAR(10),

    last_name VARCHAR(50) NOT NULL,

    first_name VARCHAR(50) NOT NULL,

    home_phone VARCHAR(20),

    mobile_phone VARCHAR(20),

    fax VARCHAR(20),

    email VARCHAR(40),

    home_airport_code CHAR(3),

    date_of_birth DATE,

    sex ENUM ('M','F'),

    date_joined_program DATE,

    date_last_flew DATETIME

) ENGINE = INNODB;

CREATE INDEX cm_ix1 ON customer_master(home_phone);



CREATE TABLE customer_address 

(

    customer_id INT UNSIGNED NOT NULL,

    customer_address_seq SMALLINT(2) UNSIGNED NOT NULL,

    address1 VARCHAR(50),

    address2 VARCHAR(50),

    address3 VARCHAR(50),

    city VARCHAR(50),

    state VARCHAR(50),

    country VARCHAR(70),

    postal_code VARCHAR(20),

    PRIMARY KEY (customer_id, customer_address_seq)

) ENGINE = INNODB;

CREATE INDEX ca_ix1 ON customer_address(postal_code);

CREATE INDEX ca_ix2 ON customer_address(country);



CREATE TABLE transactions 

(

    transaction_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,

    transaction_date DATETIME NOT NULL,

    customer_id INT NOT NULL,

    amount DECIMAL (5,2) NOT NULL,

    transaction_type ENUM ('Purchase','Credit')

) ENGINE = MYISAM; 

CREATE INDEX tr_ix2 on transactions(customer_id);



CREATE TABLE airports 

(

    airport_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,

    airport_code CHAR(3) NOT NULL,

    airport_name VARCHAR(40),

    vip_club SMALLINT(1),

    club_upgrade_date DATE

) ENGINE = INNODB;

CREATE UNIQUE INDEX air_ix1 ON airports(airport_code);



CREATE TABLE flights 

(

    flight_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,

    flight_number SMALLINT UNSIGNED NOT NULL,

    flight_date DATE NOT NULL,

    flight_departure_city CHAR(3),

    flight_arrival_city CHAR(3)

) ENGINE = INNODB;



CREATE TABLE customer_flights 

(

    customer_flight_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,

    customer_id INT UNSIGNED NOT NULL,

    flight_id SMALLINT UNSIGNED NOT NULL,

    FOREIGN KEY (customer_id) REFERENCES customer_master(customer_id),

    FOREIGN KEY (flight_id) REFERENCES flights(flight_id)

) ENGINE = INNODB;

Figure 6.2 shows a sample query as well as its related EXPLAIN output.

Figure 6.2. Sample query and its `EXPLAIN` output.

[View full size image]

How can you interpret these results? This section first looks at a field list of the returned information, and then provides a detailed explanation of each value, along with examples.

Table 6.1 lists the columns returned by the EXPLAIN command, along with each column's purpose.

Table 6.1. `EXPLAIN` Result Columns
Column	Purpose
id	The step sequence number for this statement.
select_type	The kind of `SELECT` statement to be processed.
table	The table to be processed for this statement.
type	The type of join to be performed.
possible_keys	The available potential keys if the query is able to use an index.
key	The key that the optimizer chose from all the potential keys.
key_len	The length of the selected key.
ref	The values or columns that will be fed to the selected key.
rows	The optimizer's best estimate of the number of rows to be searched in this table for this query.
extra	Additional useful information from the optimizer. If this column states `Impossible WHERE noticed after reading const tables`, this means that your query criteria returned no rows; the optimizer is unable to provide any diagnostic information in this case.

The following sections look at each of the columns in more detail.

`id`

Remember that MySQL processes tables in the order in which they are listed in the EXPLAIN output. As you evaluate the proposed query plan and ponder making adjustments, keep an eye on the sequence number for each table.

`select_type`

Depending on query structure, SELECT statements are classified as one of the following:

SIMPLE�

This is a basic SELECT statement, with no subqueries or UNION:
```
SELECT * FROM customer_master WHERE last_name = "SIERRA";
```

PRIMARY�

This is the highest level SELECT statement. In the following two table�join examples, the customer_master table has this select_type:



SELECT cm.last_name, cm.first_name 

FROM customer_master cm

WHERE cm.customer_id IN 

(

         SELECT ca.customer_id 

         FROM customer_address ca

         WHERE ca.country = "Canada"

);

UNION�

If the optimizer reports a UNION, this is any SELECT below the top level. In the following example, the second SELECT is identified as having a select_type of UNION:



SELECT cm.customer_id FROM CUSTOMER_MASTER cm 

WHERE cm.date_joined_program > "2002-01-15" AND cm.home_airport_code = 'SEA'

UNION

SELECT cm1.customer_id FROM CUSTOMER_MASTER cm1 WHERE cm1.sex = 'M';

DEPENDENT UNION�

This is similar to the preceding UNION example, except it relies on the results of an outer subquery. In the following example, the second customer_master reference (cm1) is classified as DEPENDENT UNION:



SELECT ca.customer_id FROM customer_address ca 

WHERE ca.country = "United States" AND ca.customer_id IN 

(

        SELECT cm.customer_id FROM CUSTOMER_MASTER cm 

        WHERE cm.date_joined_program > "2002-01-15" AND cm.home_airport_code = 

        'SEA'

        UNION

        SELECT cm1.customer_id FROM CUSTOMER_MASTER cm1 WHERE cm1.sex = 'M'

);

SUBQUERY�

If the query includes a subquery, the first SELECT in the subquery is identified as SUBQUERY. In the following case, the customer_flights table is chosen for that categorization:



SELECT f.flight_id, f.flight_number, f.flight_date

FROM flights f

WHERE f.flight_id = 

(

        SELECT cf.flight_id 

        FROM customer_flights cf 

        WHERE cf.customer_flight_id = 2291443

);

DEPENDENT SUBQUERY�

If the subquery relies on information from an outer subquery, the first SELECT statement inside the subquery is identified as a DEPENDENT SUBQUERY. In this example, as shown earlier, the subquery from the customer_address table is cited as a DEPENDENT SUBQUERY:
```
SELECT cm.last_name, cm.first_name 

FROM customer_master cm

WHERE cm.customer_id IN 

(

        SELECT ca.customer_id 

        FROM customer_address ca

        WHERE ca.country = "Canada"

);
```
UNION RESULT�

When the optimizer needs to create a temporary intermediate table to hold the results of a UNION, it reports UNION RESULT:
```
SELECT * FROM customer_master99 WHERE first_name LIKE 'Bill%' 

UNION 

SELECT * FROM customer_master99 WHERE email LIKE '%@mysql.com';
```
DERIVED�

If a query involves a derived table (including a view), MySQL assigns the DERIVED select_type (see Figure 6.3).
Figure 6.3. EXPLAIN output showing a DERIVED select_type.

[View full size image]

`table`

MySQL cites the table name for each step of the query plan. Derived tables are identified as such; tables to hold the results of UNIONs are prefixed with union.

`type`

As you saw earlier, type refers to the kind of join that MySQL performs to retrieve your information. This is crucial to estimating performance because a bad join strategy can really bog down your application.

The following list looks at each of the potential values for this column, along with the scenarios that trigger the result. Because the goal is optimization, the list begins with the most sluggish values, and then works its way toward the fastest. As you experiment with your queries, always strive to achieve values toward the end of this list:

ALL�

This value means that the optimizer is forced to perform a table scan and read all rows in a table to find your results. This often happens when you are missing an index or are querying on substrings that can't be indexed:
```
SELECT mobile_phone

FROM customer_master

WHERE mobile_phone LIKE "%1212%";
```
For small tables, this is not usually a problem but suppose that in this case the customer_master table has 75 million rows. ALL means that every one of these rows will be looked at to find matching mobile phone numbers. The problem becomes much more dramatic when the query has joins among numerous tables and also involves multiple table scans.
index�

Although similar to a table scan, these types of queries mean that MySQL is able to read the index to retrieve the data, rather than the underlying table. In the preceding example, suppose that you placed an index on the mobile_phone column and then ran the query again:
```
SELECT mobile_phone

FROM customer_master

WHERE mobile_phone LIKE "%1212%";
```
The primary savings from this type of search is that the index is generally smaller than the data; otherwise, it's still a very inefficient query because all entries need to be read. Furthermore, if you request other columns than those in the index, MySQL is forced to read the table itself.
range�

Range queries make use of indexes to find data within the range specified by the query:
```
SELECT ca.*

FROM customer_address ca

WHERE ca.postal_code BETWEEN "11561" AND "11710";
```
The efficiency of these types of queries is highly dependent on the breadth of the search, as well as the cardinality (degree of uniqueness) of the index.
index_subquery�

This kind of query takes advantage of an index to speed results on a subquery:
```
SELECT cm.last_name, cm.first_name 

FROM customer_master cm

WHERE cm.customer_id IN 

(

        SELECT ca.customer_id 

        FROM customer_address ca

        WHERE ca.postal_code = "TVC15-3CPU"

);
```
In the preceding example, MySQL performs a table scan on customer_master, and for each located row uses the index on the postal_code field in the customer_address to complete the join.
unique_subquery�

Just like the previous example of index_subquery, this kind of query utilizes an index to quickly find the right information in a subquery. However, it's even faster because it is able to use a unique index or primary key to get to the data.
```
SELECT ca.country

FROM customer_address ca

WHERE ca.customer_id in 

(

        SELECT cm.customer_id

        FROM customer_master cm

        WHERE cm.ff_number LIKE "%AAA-%"

);
```
In fact, this kind of search doesn't even read the subqueried table itself, but instead returns results directly from the index�the customer_master.customer_id field in this case.
index_merge�

Prior to version 5.0, MySQL could only employ one index when processing a table. On the other hand, many queries contain criteria that can be rapidly located via two or more indexes. Versions 5.0 and later can now potentially make use of multiple indexes on the same table, greatly reducing query time. The EXPLAIN command shows a type of index_merge when MySQL is able to leverage two or more indexes on a single table.
This concept is discussed more a little later in this chapter.
ref/ref_or_null�

These types of searches use a nonunique index to find one to potentially many rows from a table:
```
SELECT ca.*

FROM customer_address ca

WHERE ca.postal_code = "11561";
```
If ref_or_null is returned, it means that the query is also looking for null values from an index column:
```
SELECT ca.*

FROM customer_address ca

WHERE ca.postal_code = "11561"

OR ca.postal_code IS NULL;
```
eq_ref�

These types of searches first take data from an initial query and then use an index to find subsequent information. However, eq_ref searches are able to employ unique indexes or primary keys, making these joins much faster:
```
SELECT cm.last_name, cm.first_name, ca.city

FROM customer_master cm, customer_address ca

WHERE ca.city = "Honolulu"

AND ca.customer_id = cm.customer_id;
```
In this example, MySQL first interrogates the customer_address table (via a highly inefficient table scan) to find records from Honolulu. It then takes these results and searches for records in the customer_master table using the primary key, customer_id. Fortunately, this second search goes very fast: One and only one row will have a particular customer_id.
const�

The query will have a type of const (that is, constant value) if the optimizer is able to fully use a unique index or primary key to satisfy your search:
```
SELECT address1, address2, address3

FROM customer_address ca

WHERE ca.customer_id = 3484 

AND ca.customer_address_seq = 1;
```
This is the fastest possible query; MySQL only needs to read one row to find your result. Note that if you're reading a system table with only one row, the optimizer classifies this as system instead of const.

`possible_keys` and `key`

MySQL often has multiple indexes at its disposal when trying to decide on a query plan. Viewing possible_keys lets you see the full list of potential indexes that the optimizer is evaluating, whereas the key value tells you what MySQL ended up choosing, as shown in Figure 6.4.

Figure 6.4. `EXPLAIN` output for a query that uses the primary key.

[View full size image]

In this case, MySQL had a choice between the primary key (customer_id) and the ca_ix2 index (country). The optimizer selected the primary key.

If you're curious to learn more about how the optimizer chooses a plan, try experimenting with queries. For example, observe what happens when you change one word in the query (from AND to OR), as shown in Figure 6.5.

Figure 6.5. Query plan radically changed by substitution of `OR` for `AND`.

[View full size image]

What happened? Why can't MySQL use our well-thought-out indexes? In this case, the search is written with an OR clause that refers to a single table. This forces the engine into a sequential table scan (ALL), or at least it did until version 5.0. MySQL version 5.0 features much more robust optimizer algorithms that can speed these kinds of queries. Figure 6.6 shows the same query after version 5.0.

Figure 6.6. Version 5.0 now uses the `index_merge` algorithm to avoid a table scan by taking advantage of multiple indexes.

[View full size image]

With version 5.0, MySQL can now make use of multiple indexes on a single table to speed results. These new algorithms are explored a little later in this chapter. You'll also examine how to override the optimizer's analysis. Finally, Chapter 7, "Indexing Strategies," spends much more time probing indexing strategies.

`key_len`

After MySQL has selected a key, this field shows the length of the chosen option. The example shown in Figure 6.7 uses the integer-based primary key for the customer_address table, which is four bytes in length. The second example searches on an indexed string field.



SELECT *

FROM customer_address ca

WHERE customer_id = 190432 OR country = "Sweden";



SELECT *

FROM customer_address ca

WHERE country = "Singapore";

Figure 6.7. Different query plans based on the type of index key chosen.

[View full size image]

MySQL reports the length of the field if it's a single field. For a multifield index, MySQL reports the full length of all fields (working from left to right) that will be used.

For example, suppose you create a multifield index (ca_ix3) on the customer_address city and state columns and then query against those values, as shown in Figure 6.8.

Figure 6.8. Query that uses a new multifield index.

[View full size image]

As you can see, MySQL was able to use the multifield index to satisfy the query; the key length is the sum of the two fields (50 + 50), plus one overhead byte per field.

`ref`

This field tracks the values or columns that will be used to look for the row(s), as shown in Figure 6.9.

Figure 6.9. Query plan showing candidate columns for searching.

[View full size image]

In this example, MySQL is able to filter the customer_address table on the newly created index, and then use the customer_id field as the lookup field for the second part of the query: matching the customer_address rows with their counterparts in customer_master.

It might be somewhat difficult to analyze and understand the results in this field because there will be times that MySQL reports NULL on a search when you might expect a const instead, as shown in Figure 6.10.

Figure 6.10. Two similar queries with different types of `ref` results.

[View full size image]

The first query performs a range search, which results in NULL in the ref column:



SELECT *

FROM customer_address ca

WHERE city in ("Phoenix", "Yuma")

The second search performs a ref search, so you see const in the ref column:



SELECT *

FROM customer_address ca

WHERE city = "Phoenix" OR "Yuma"

`rows`

MySQL uses its internal table and index statistics to provide this number, which is its best estimate of the quantity of records that should be read to produce the results. Pay attention to very large numbers in this column: It usually indicates a table scan or highly duplicate index. As you saw earlier, this can become an even bigger problem when many tables in the query require the reading of large numbers of rows.

To keep MySQL's statistics up to date, it's also important to run ANALYZE TABLE or OPTIMIZE TABLE for those tables that change frequently.

`Extra`

The EXPLAIN command offers additional diagnostic information on top of the helpful data you've seen so far. These messages are placed in the exTRa column, and can include one or more of the following:

Impossible WHERE noticed after reading const tables�

As mentioned earlier, this message means that you have supplied search criteria that returns no rows. For example, you will see this message if you test a query that searches on a nonexistent value from an indexed column. If this happens, try rewriting your query with more inclusive criteria so that the EXPLAIN command can properly analyze your search.
Using where�

This is likely to be the most commonly encountered message in this section. You'll usually see it when you specify a WHERE clause to filter the number of potential rows.
Distinct�

The optimizer reports this condition when it determines that the first row that matches the query criteria will suffice, and there is no need to check additional rows.
Not exists�

There are certain situations when joining two or more tables in which MySQL detects the impossibility of finding more than one row in a table other than the leftmost table. This usually happens because additional rows would violate a primary key or NOT NULL constraint. When this condition is triggered, MySQL reports Not exists in the extra column.
Range checked for each record�

In certain situations, such as range queries combined with nonselective indexes, the optimizer reports this condition. It then attempts to resolve the query via the most useful key.
Using filesort�

The optimizer uses a filesort if your query conditions require MySQL to first identify the requested rows and then sort them without the benefit of an index:
```
SELECT * 

FROM customer_address ca

ORDER BY ca.state;
```
Recall that the customer_address table does not have an index in which state is the leftmost (or only) component. That means that the existing multipart index on city, state is of no use in processing this query, thereby necessitating the filesort.
Fortunately, from version 4.1 onward, MySQL uses a much more sophisticated caching mechanism that reduces the amount of disk processing necessary when a filesort is mandated.
Using index�

You'll see this message for those times that MySQL can return your results by simply reading the index without having to read the table itself. For example, for the following two query plans shown in Figure 6.11, only the first is displayed as using index.
```
SELECT flight_id

FROM flights

WHERE flight_id = 8321;



SELECT flight_id, flight_date

FROM flights

WHERE flight_id = 8321;
```
Figure 6.11. Using index is displayed when all requested information is found within the index.

[View full size image]

The second query forces MySQL to read the row itself because the flight_date column is not indexed.
Using temporary�

You'll receive this message if your query requires MySQL to store intermediate results in a temporary storage table before returning the final answer to you. In many cases, you will see this combined with the filesort message.
Using index for group-by�

In certain cases, MySQL might be able to consult an index to facilitate a GROUP BY request, rather than retrieving data from the underlying table.

As you saw earlier in the discussion on the type portion of EXPLAIN's output, MySQL's optimizer features enhanced algorithms beginning with version 5.0. These new capabilities let MySQL exploit two or more indexes on a single table to speed queries.

The following list looks at how the extra portion of the EXPLAIN command reports on these new algorithms.

Using intersect�

MySQL is able to use the index merge intersect algorithm when you create a query that employs ranges on a primary key and also includes a search on a value that is covered by a secondary index in the same table and related to the first part of the search with an AND.
If these conditions are met, MySQL works in parallel to process the table using more than one index, and then merges the products back into a complete resultset.
Look at some sample queries to see the difference between MySQL version 5.0 and earlier. Before beginning, assume that you add the following index to the customer_master table described earlier in the chapter:
```
CREATE INDEX cm_ix6 ON customer_master(home_airport_code);
```
Now, you can evaluate a query on the customer_master table that attempts to find a range of rows on the primary key, and also provides a value for the indexed home_airport_code column. In pre-5.0 MySQL, note the optimizer's report on this query, as shown in Figure 6.12.
Figure 6.12. Pre-5.0 query plan: Only one index per table can be processed.

[View full size image]

The optimizer correctly detected the relevant indexes, and picked one to retrieve data. In version 5.0 and beyond, the optimizer's report is different, as shown in Figure 6.13.
Figure 6.13. Post-5.0 query plan: Index merge algorithm can take advantage of multiple indexes per table.

[View full size image]

The query and database remain the same, yet the query plan has changed. Why? MySQL is now able to use the index_merge algorithm to speed through the table by using the primary key index as well as the index on home_airport_code. At the end, it merges the two resultsets to give you a final answer.
Note that if the range is very large, even the 5.0 optimizer might elect to avoid using this algorithm because the index on the range column(s) isn't selective enough. Instead, MySQL simply drops back to a more traditional ref search, as shown in Figure 6.14.
Figure 6.14. A too-large range may force a ref search instead of using the index merge algorithm.

[View full size image]
Using union�

Just as with the using intersect result, using union is the outcome when you specify a range search on a primary key and include a search on a column that is also indexed. The chief difference between these two results is that using intersect combines the two search criteria with an AND, whereas using union creates a union between the two resultsets by employing OR. Figure 6.15 shows what you receive from EXPLAIN prior to version 5.0.
Figure 6.15. Expensive table scan forced prior to version 5.0.

[View full size image]

This is a very expensive table scan; the OR sees to that. Observe the results for 5.0, however, in Figure 6.16.
Figure 6.16. Much more efficient multiindex query plan in version 5.0 and beyond.

[View full size image]

This should run much faster: MySQL is able to use two indexes rather than slogging through the entire data set with a table scan.
Using sort_union�

As you would expect from the name, the chief difference between a report of this option and using union is that using sort union indicates that MySQL first has to sort the individual resultsets (using their rowids) and then merge them together before giving you an answer.
Suppose that you place an index on the customer_master's date_joined_program column:
```
CREATE INDEX cm_ix5 ON customer_master(date_joined_program);
```
Now, suppose you create two range queries on the same table and combine them with an OR. Figure 6.17 shows the report prior to version 5.0.
Figure 6.17. Expensive table scan forced prior to version 5.0.

[View full size image]

Again, this is a very costly table scan as MySQL works through the entire table to locate your results. It's a different story with version 5.0, as shown in Figure 6.18.
Figure 6.18. Much more efficient multiindex query plan in version 5.0 and beyond.

[View full size image]

If the optimizer determines that neither index is selective enough, however, even version 5.0 falls back to a slow table scan, unless you specify FORCE INDEX with your query. This option is investigated later in the chapter.

As you experiment with your queries, see if you can construct your indexes and searches to take advantage of these new 5.0 features.

Helping the Optimizer

As you've seen, MySQL's optimizer conducts a sophisticated analysis of the facts before coming up with a query plan. However, it is not infallible, and there might be times when it could use your help. This section investigates several ways to provide assistance to the optimizer.

Remember that it's always a good idea to take a scientific approach to your query tuning, and track the impact your suggestions have on query plans and actual response. Also, remember that this section only looks at ways to directly affect the optimizer's behavior (including query plan choice). In particular, this section examines SELECT STRAIGHT_JOIN and USE INDEX, IGNORE INDEX, and FORCE INDEX.

You'll look at ways to tune your SQL statements and engine behavior in later chapters. These optimizer-affecting SQL statement options include the following:

SELECT HIGH_PRIORITY
SELECT SQL_BIG_RESULT
SELECT SQL_SMALL_RESULT
SELECT SQL_BUFFER_RESULT
SELECT SQL_CACHE
SELECT SQL_NO_CACHE
LIMIT
SQL_CALC_FOUND_ROWS

`SELECT STRAIGHT_JOIN`

For a multitable join, you might elect to change the order in which the optimizer joins your tables. By specifying STRAIGHT_JOIN in your SQL, MySQL is forced to join the tables in the order in which they're specified in your query. Note that this is an extension by MySQL to standard SQL.

You can look at a few examples to see when this might help (or even hurt!) performance. For simplicity, let's not consider the impact of query caching, memory, or other engine configuration.

First, suppose you want to perform a very simple join between the customer_master and customer_address tables:



SELECT cm.last_name, cm.first_name, ca.postal_code, ca.country

FROM customer_master cm, customer_address ca

WHERE cm.customer_id = ca.customer_id;

For this join, MySQL first performs a table scan through the customer_address table. It reads each row, and then performs an eq_ref join to the customer_master table, using its primary key customer_id field.

Assuming that the customer_master table has four million rows, the total number of rows read for this query is more than eight million: four million rows read from the customer_address table via a table scan, and one customer_master row read via the primary key per returned row.

Suppose, however, that the customer_master table is only one fourth as large as customer_address, and only contains one million rows. How could this be? Perhaps each customer has multiple addresses (home, office, billing, warehouse, and so on). In any case, what would happen if you forced the optimizer to first look at the customer_master table and then join to customer_address?



SELECT STRAIGHT_JOIN cm.last_name, cm.first_name, ca.postal_code, ca.country

FROM customer_master cm, customer_address ca

WHERE cm.customer_id = ca.customer_id;

In this case, MySQL churns through the one million row customer_table, picking up rows one-by-one and then performing a rapid lookup into the customer_address table, using the indexed customer_id field. The number of rows read for this query is approximately five million: one million table-scanned rows from customer_master and an average of four customer_address rows to match the indexed join.

For this example, it might be worth it to force MySQL first to read through the smaller table and then do an indexed lookup. Unfortunately, it's easy to imagine a scenario in which you damage performance by incorrectly forcing a straight join, so be careful when experimenting.

Let's look at an experiment gone wrong: a badly designed optimizer override. Suppose that you want to retrieve a list of all customers, along with the name of their home airport:



SELECT cm.last_name, cm.first_name, ai.airport_name

FROM airports ai, customer_master cm

WHERE cm.home_airport_code = ai.airport_code;

Note that the airport_code field is uniquely indexed in the airports table (but not in the customer_master table), and this table has 1,000 rows. For this query, MySQL correctly performs a table scan on the customer_master table, reading its four million rows one-by-one and then performing a single read into the airports table for each customer_master record.

What happens if you change the query to force MySQL to first read the airports table?



SELECT STRAIGHT_JOIN cm.last_name, cm.first_name, ai.airport_name

FROM airports ai, customer_master2 cm

WHERE cm.home_airport_code = ai.airport_code;

MySQL dutifully reads a row from the airports table, and then runs a table scan to read every one of the four million row customer_master table. After the table scan finishes, MySQL reads the next row from airports, and starts the table scan again to look for a new match.

This is hundreds of times slower than if you had just left it alone. Of course, this query would run much more efficiently if the airport_code field was indexed in customer_master, but this isn't the case in the example.

So far, you've looked at simple, two-table queries. Imagine the amount of time needed to study the potential impact (good or bad) of forcing straight joins on a five, six, or more table join. How would you know which table should go first, second, and so on? The number of permutations that you would need to test could become enormous. This is why, in most cases, it's a good idea to just let the optimizer do its job, unless you really do know better.

Forcing Index Selection

It's quite likely that many of your queries will access tables with multiple indexes. Generally, MySQL's optimizer uses its internal statistics to identify and choose the most efficient index. However, there might be situations in which you need to override this behavior. This section reviews several options that you can use in your SELECT statements to control how the optimizer picks an index. Recall that we're only looking at how to affect the optimizer's choice of index; Chapter 7 delves much more deeply into best practices for indexing.

To keep this as simple as possible, this example only looks at a single-table query; MySQL uses this index selection to find rows in a particular table, and only then processes any subsequent joins. To save you the effort of flipping pages, the customer_master table is redisplayed here, along with some additional indexes:



CREATE TABLE customer_master 

(

    customer_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,

    ff_number CHAR(10),

    last_name VARCHAR(50) NOT NULL,

    first_name VARCHAR(50) NOT NULL,

    home_phone VARCHAR(20),

    mobile_phone VARCHAR(20),

    fax VARCHAR(20),

    email VARCHAR(40),

    home_airport_code CHAR(3),

    date_of_birth DATE,

    sex ENUM ('M','F'),

    date_joined_program DATE,

    date_last_flew DATETIME

) ENGINE = INNODB;

CREATE INDEX cm_ix1 ON customer_master(home_phone);

CREATE INDEX cm_ix2 ON customer_master(ff_number);

CREATE INDEX cm_ix3 ON customer_master(last_name, first_name);

CREATE INDEX cm_ix4 ON customer_master(sex);

CREATE INDEX cm_ix5 ON customer_master(date_joined_program);

CREATE INDEX cm_ix6 ON customer_master(home_airport_code);

`USE INDEX`/`FORCE INDEX`

It's quite possible that MySQL will need to choose from a candidate pool of more than one index to satisfy your query. If, after reading the EXPLAIN output, you determine that MySQL is picking the wrong index, you can override the index selection by adding either USE INDEX or FORCE INDEX to the query. Take a look at some examples.

First, suppose that you want to issue a search to find all customers who have a home phone number in the city of Boston, and whose home airport is Boston Logan, as shown in Figure 6.19.

Figure 6.19. Optimizer correctly chooses between two candidate indexes to process a query.

[View full size image]

What does this tell us? MySQL correctly identified two candidate indexes to speed the search: cm_ix1 (on home_phone) and cm_ix6 (on home_airport_code). In the end, it chose the index on home_phone (because it is much more selective than the home_airport_code field).

However, what if you wanted to force MySQL to use the index on home_airport_code? You would specify the USE INDEX option in your SQL statement, as shown in Figure 6.20.

Figure 6.20. User intervention to force selection of a particular index to process a query.

[View full size image]

Observe that MySQL no longer even reports the availability of the cm_ix1 key; it only specifies the key you requested. What happens if you mistype the key (see Figure 6.21)?

Figure 6.21. A typo produces an expensive table scan.

[View full size image]

This is a costly mistake: MySQL does not warn us that this index is irrelevant for this query, and instead performs a table scan because of our typing error. The moral here is to pay attention when you override the optimizer!

Where does FORCE INDEX fit in? FORCE INDEX (first enabled in version 4.0.9) is very similar to USE INDEX; the main difference between the two options is that FORCE INDEX demands that MySQL use the index (if possible) in lieu of a more expensive table scan, whereas USE INDEX still allows the optimizer to choose a table scan.

`IGNORE INDEX`

Sometimes, tables are overindexed, and might confuse the optimizer into choosing an inefficient query plan. Fortunately, MySQL lets you "hide" certain indexes from the optimizer.

Suppose you put the query shown in Figure 6.22 to MySQL.

Figure 6.22. Optimizer chooses a very nonselective index.

[View full size image]

MySQL has correctly identified two candidates indexes�the primary key (customer_id) and cm_ix4 (sex)�and has chosen the latter. Unfortunately, this is not a very selective index, and probably shouldn't even exist. Chapter 7 discusses indexing in much more detail, but for now, you can use the IGNORE KEY option to tell the optimizer to avoid this key, as shown in Figure 6.23.

Figure 6.23. Removing an index from consideration by the optimizer.

[View full size image]

It's important to understand that overriding the optimizer will be a rare event as long as you keep your data and index statistics up to date (via ANALYZE TABLE and/or OPTIMIZE TABLE).

Optimizer Processing Options

Recall that it's the job of the optimizer to select the best plan among all possible alternatives. For relatively simple queries, the number of different choices might not be too large. But what happens if you construct very complex queries, involving more than a dozen large tables and very intricate search criteria? In these kinds of situations, it's possible that MySQL might spend more time picking a query plan from the millions of potential permutations than actually executing the query!

Fortunately, versions 5.01 and beyond offer two system variables that give the database developer some control over how much time the optimizer spends calculating its query plan.

`optimizer_prune_level`

The optimizer_prune_level variable, set to 1 (enabled) by default, tells the optimizer to avoid certain query plans if it determines that the number of rows processed per table will be excessive.

If you are confident that this avoidance is hurting performance and that MySQL should examine more query plans, set the variable to zero (disabled). Of course, this kind of experiment is a much better candidate for your performance testing platform, rather than your production system.

`optimizer_search_depth`

As you saw earlier, queries that touch numerous tables can cause the optimizer to spend more time calculating the query plan than actually running the query. Setting the optimizer_search_depth system variable lets you control how exhaustive these computations will be.

If you make this number large (close to the actual number of tables in the search), MySQL checks many more query plans, which could drastically degrade performance. On the other hand, a small number causes the optimizer to come up with a plan much faster, at the risk of skipping a better plan. Finally, setting this value to zero lets the optimizer make its own decision, and is probably the safest course of action in most cases.

< Day Day Up >

Programmer's Life

Wednesday, October 28, 2009

Optimizer Diagnostics and Tuning

Optimizer Diagnostics and Tuning

The ANALYZE TABLE Command

The OPTIMIZE TABLE Command

The EXPLAIN Command

Figure 6.1. Viewing EXPLAIN results for a table.

Figure 6.2. Sample query and its EXPLAIN output.

Table 6.1. EXPLAIN Result Columns

id

select_type

Figure 6.3. EXPLAIN output showing a DERIVED select_type.

table

type

possible_keys and key

Figure 6.4. EXPLAIN output for a query that uses the primary key.

Figure 6.5. Query plan radically changed by substitution of OR for AND.

Figure 6.6. Version 5.0 now uses the index_merge algorithm to avoid a table scan by taking advantage of multiple indexes.

key_len

Figure 6.7. Different query plans based on the type of index key chosen.

Figure 6.8. Query that uses a new multifield index.

ref

Figure 6.9. Query plan showing candidate columns for searching.

Figure 6.10. Two similar queries with different types of ref results.

rows

Extra

Figure 6.11. Using index is displayed when all requested information is found within the index.

Figure 6.12. Pre-5.0 query plan: Only one index per table can be processed.

Figure 6.13. Post-5.0 query plan: Index merge algorithm can take advantage of multiple indexes per table.

Figure 6.14. A too-large range may force a ref search instead of using the index merge algorithm.

Figure 6.15. Expensive table scan forced prior to version 5.0.

Figure 6.16. Much more efficient multiindex query plan in version 5.0 and beyond.

Figure 6.17. Expensive table scan forced prior to version 5.0.

Figure 6.18. Much more efficient multiindex query plan in version 5.0 and beyond.

Helping the Optimizer

SELECT STRAIGHT_JOIN

Forcing Index Selection

USE INDEX/FORCE INDEX

Figure 6.19. Optimizer correctly chooses between two candidate indexes to process a query.

Figure 6.20. User intervention to force selection of a particular index to process a query.

Figure 6.21. A typo produces an expensive table scan.

IGNORE INDEX

Figure 6.22. Optimizer chooses a very nonselective index.

Figure 6.23. Removing an index from consideration by the optimizer.

Optimizer Processing Options

optimizer_prune_level

optimizer_search_depth

No comments:

Blog Archive

About Me

Link

The `ANALYZE TABLE` Command

The `OPTIMIZE TABLE` Command

The `EXPLAIN` Command

Figure 6.1. Viewing `EXPLAIN` results for a table.

Figure 6.2. Sample query and its `EXPLAIN` output.

Table 6.1. `EXPLAIN` Result Columns

`id`

`select_type`

Figure 6.3. `EXPLAIN` output showing a `DERIVED select_type`.

`table`

`type`

`possible_keys` and `key`

Figure 6.4. `EXPLAIN` output for a query that uses the primary key.

Figure 6.5. Query plan radically changed by substitution of `OR` for `AND`.

Figure 6.6. Version 5.0 now uses the `index_merge` algorithm to avoid a table scan by taking advantage of multiple indexes.

`key_len`

`ref`

Figure 6.10. Two similar queries with different types of `ref` results.

`rows`

`Extra`

Figure 6.11. `Using index` is displayed when all requested information is found within the index.

Figure 6.14. A too-large range may force a `ref` search instead of using the `index merge` algorithm.

`SELECT STRAIGHT_JOIN`

`USE INDEX`/`FORCE INDEX`

`IGNORE INDEX`

`optimizer_prune_level`

`optimizer_search_depth`