15.1 Know When to Use Specific Constructs
Depending on the circumstances, certain SQL constructs are
preferable to others. For example, use of the EXISTS predicate is
often preferable to DISTINCT. The next sections discuss the usage of
such constructs.
15.1.1 EXISTS Is Preferable to DISTINCT
The
DISTINCT keyword
used in a SELECT clause eliminates duplicate rows in the result set.
To eliminate those duplicates, Oracle performs a sort, and that sort
requires time and disk space. Therefore, avoid using DISTINCT if you
can tolerate having duplicate rows returned by a query. If you
can't tolerate the duplicate rows, or your
application can't handle them, use EXISTS in place
of DISTINCT.
For example, assume you are trying to find the names of customers who
have orders. Your query has to be based on two tables:
customer and cust_order. Using
DISTINCT, your query would be written as follows:
SELECT DISTINCT c.cust_nbr, c.name
FROM customer c JOIN cust_order o
ON c.cust_nbr = o.cust_nbr;
The corresponding execution plan for this query is as follows. Note
the SORT operation, which is a result of DISTINCT being used.
Query Plan
-----------------------------------------
SELECT STATEMENT Cost = 3056
SORT UNIQUE
MERGE JOIN
INDEX FULL SCAN IND_ORD_CUST_NBR
SORT JOIN
TABLE ACCESS FULL CUSTOMER
To use EXISTS, the query needs to be rewritten as follows:
SELECT c.cust_nbr, c.name
FROM customer c
WHERE EXISTS (SELECT 1 FROM cust_order o WHERE c.cust_nbr = o.cust_nbr);
Here is the execution plan for the EXISTS version of the queries:
Query Plan
---------------------------------------
SELECT STATEMENT Cost = 320
FILTER
TABLE ACCESS FULL CUSTOMER
INDEX RANGE SCAN IND_ORD_CUST_NBR
Notice that the second query eliminates the overhead of the sort
operation, and therefore runs faster.
15.1.2 WHERE Versus HAVING
We discussed the GROUP BY and HAVING clauses in
Chapter 4. Sometimes, when writing a GROUP BY
query, you have a condition that you can specify in either the WHERE
or HAVING clause. In situations where you have a choice,
you'll always get better performance if you specify
the condition in the WHERE clause. The reason is that
it's less expensive to eliminate rows before they
are summarized than it is to eliminate results after summarization.
Let's look at an example illustrating the advantage
of WHERE over HAVING. Here's a query with the HAVING
clause that reports the number of orders in the year 2000:
SELECT year, COUNT(*)
FROM orders
GROUP BY year
HAVING year = 2001;
YEAR COUNT(*)
---------- ----------
2001 1440
The execution plan for this query is as follows:
Query Plan
-------------------------------------------
SELECT STATEMENT Cost = 6
FILTER
SORT GROUP BY
INDEX FAST FULL SCAN ORDERS_PK
Now, look at that same query, but with the year restriction in the
WHERE clause:
SELECT year, COUNT(*)
FROM orders
WHERE year = 2001
GROUP BY year;
YEAR COUNT(*)
--------- ----------
2001 1440
The execution plan for this version of the query is:
Query Plan
-------------------------------------
SELECT STATEMENT Cost = 2
SORT GROUP BY NOSORT
INDEX FAST FULL SCAN ORDERS_PK
With the HAVING clause, the query performs the group operation first,
and then filters the groups for the condition specified. The WHERE
clause
version of the query filters the rows before
performing the group operation. The result of filtering with the
WHERE clause is that there are fewer rows to summarize, and
consequently, the query performs better.
However, you should note that not all types of filtering can be
achieved using the WHERE clause. Sometimes, you may need to summarize
the data first and then filter the summarized data based on the
summarized values. In such situations, you have to filter using the
HAVING clause, because only the HAVING clause can
"see" summarized values. Moreover,
there are situations when you may need to use the WHERE clause and
the HAVING clause together in a query to filter the results the way
you want. For details, see Chapter 4.
15.1.3 UNION Versus UNION ALL
We discussed UNION and UNION ALL in Chapter 7. UNION ALL combines the results of two
SELECT statements. UNION combines the results of two SELECT
statements, and then returns only distinct rows from the combination;
duplicates are eliminated. It is, therefore, obvious that to remove
the duplicates, UNION performs one extra step than UNION ALL. This
extra step is a sort, which is costly in terms of performance.
Therefore, whenever your application can handle duplicates or you are
certain that no duplicates will result, consider using UNION ALL
instead of UNION.
Let's look an example to understand this issue
better. The following query uses UNION to return a list of orders
where the sale price exceeds $50.00 or where the customer is located
in region 5:
SELECT order_nbr, cust_nbr
FROM cust_order
WHERE sale_price > 50
UNION
SELECT order_nbr, cust_nbr
FROM cust_order
WHERE cust_nbr IN
(SELECT cust_nbr FROM customer WHERE region_id = 5);
ORDER_NBR CUST_NBR
---------- ----------
1000 1
1001 1
1002 5
1003 4
1004 4
1005 8
1006 1
1007 5
1008 5
1009 1
1011 1
1012 1
1015 5
1017 4
1019 4
1021 8
1023 1
1025 5
1027 5
1029 1
20 rows selected.
The execution plan for this UNION query is:
Query Plan
---------------------------------------------------------------------------
SELECT STATEMENT Cost = 8
SORT UNIQUE
UNION-ALL
TABLE ACCESS FULL CUST_ORDER
HASH JOIN
TABLE ACCESS FULL CUSTOMER
TABLE ACCESS FULL CUST_ORDER
The following query uses UNION ALL instead of UNION to get the same
information:
SELECT order_nbr, cust_nbr
FROM cust_order
WHERE sale_price > 50
UNION ALL
SELECT order_nbr, cust_nbr
FROM cust_order
WHERE cust_nbr IN
(SELECT cust_nbr FROM customer WHERE region_id = 5);
ORDER_NBR CUST_NBR
---------- ----------
1001 1
1003 4
1005 8
1009 1
1012 1
1017 4
1021 8
1029 1
1001 1
1000 1
1002 5
1003 4
1004 4
1006 1
1007 5
1008 5
1009 1
1012 1
1011 1
1015 5
1017 4
1019 4
1023 1
1025 5
1027 5
1029 1
26 rows selected.
Note the duplicate rows in the output. However, note also that UNION
ALL performs better than UNION, as you can see from the following
execution plan:
Query Plan
---------------------------------------------------------------------------
SELECT STATEMENT Cost = 4
UNION-ALL
TABLE ACCESS FULL CUST_ORDER
HASH JOIN
TABLE ACCESS FULL CUSTOMER
TABLE ACCESS FULL CUST_ORDER
You can see that the extra operation (SORT UNIQUE) in the UNION makes
it run slower than UNION ALL.
15.1.4 LEFT Versus RIGHT OUTER JOIN
As you have seen in Chapter 3, outer joins can
be of type LEFT, RIGHT, or FULL. LEFT and RIGHT are really two ways
of looking at the same operation. Mixing LEFT and
RIGHT
outer joins in the same application can cause confusion, as you and
other programmers must constantly shift your point-of-view from one
approach to the other. Use both LEFT and RIGHT outer joins in the
same query, and you'll find
your confusion greatly magnified. For example:
SELECT e.lname, j.function, d.name
FROM job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id
RIGHT OUTER JOIN department d ON e.dept_id = d.dept_id;
LNAME FUNCTION NAME
-------------------- ------------------------------ ------------
MILLER CLERK ACCOUNTING
CLARK MANAGER ACCOUNTING
KING PRESIDENT ACCOUNTING
SMITH CLERK RESEARCH
FORD ANALYST RESEARCH
JONES MANAGER RESEARCH
SCOTT ANALYST RESEARCH
JAMES CLERK SALES
BLAKE MANAGER SALES
MARTIN SALESPERSON SALES
TURNER SALESPERSON SALES
ALLEN SALESPERSON SALES
OPERATIONS
Such confusion is unnecessary. Since both LEFT and RIGHT outer joins
represent the same operation, but from differing points of view, you
can simply pick one point of view and use it consistently. For
example, many programmers write all outer joins as either FULL or
LEFT, ignoring RIGHT.
The preceding query uses a LEFT and then a RIGHT outer join to do the
following:
Connect an outer join from employee to
job, with employee as the
required table Connect another outer join from department to the
results from Step 1, with department as the
required table
Using parentheses to explicitly state the above order of operations,
you can rewrite the query using all LEFT outer joins, as follows:
SELECT e.lname, j.function, d.name
FROM department d LEFT OUTER JOIN
(job j LEFT OUTER JOIN employee e
ON e.job_id = j.job_id)
ON e.dept_id = d.dept_id;
LNAME FUNCTION NAME
-------------------- ------------------------------ -------------
MILLER CLERK ACCOUNTING
CLARK MANAGER ACCOUNTING
KING PRESIDENT ACCOUNTING
SMITH CLERK RESEARCH
FORD ANALYST RESEARCH
JONES MANAGER RESEARCH
SCOTT ANALYST RESEARCH
JAMES CLERK SALES
BLAKE MANAGER SALES
MARTIN SALESPERSON SALES
TURNER SALESPERSON SALES
ALLEN SALESPERSON SALES
OPERATIONS
The tradeoff here is between using parentheses and mixing RIGHT and
LEFT outer joins. This second version of the query still joins
employee to job, and then joins
department to that result. The operations are
exactly the same as in the previous version. This time, the
parentheses make the order of operations clearer, and we personally
find the
second version of the query a bit
easier to understand.
|
No comments:
Post a Comment