Wednesday, January 20, 2010

15.1 Know When to Use Specific Constructs











 < Day Day Up > 







15.1 Know When to Use Specific Constructs





Depending on the circumstances, certain SQL constructs are

preferable to others. For example, use of the EXISTS predicate is

often preferable to DISTINCT. The next sections discuss the usage of

such constructs.







15.1.1 EXISTS Is Preferable to DISTINCT





The



DISTINCT keyword

used in a SELECT clause eliminates duplicate rows in the result set.

To eliminate those duplicates, Oracle performs a sort, and that sort

requires time and disk space. Therefore, avoid using DISTINCT if you

can tolerate having duplicate rows returned by a query. If you

can't tolerate the duplicate rows, or your

application can't handle them, use EXISTS in place

of DISTINCT.





For example, assume you are trying to find the names of customers who

have orders. Your query has to be based on two tables:

customer and cust_order. Using

DISTINCT, your query would be written as follows:





SELECT DISTINCT c.cust_nbr, c.name

FROM customer c JOIN cust_order o

ON c.cust_nbr = o.cust_nbr;







The corresponding execution plan for this query is as follows. Note

the SORT operation, which is a result of DISTINCT being used.





Query Plan

-----------------------------------------

SELECT STATEMENT Cost = 3056

SORT UNIQUE

MERGE JOIN

INDEX FULL SCAN IND_ORD_CUST_NBR

SORT JOIN

TABLE ACCESS FULL CUSTOMER







To use EXISTS, the query needs to be rewritten as follows:





SELECT c.cust_nbr, c.name

FROM customer c

WHERE EXISTS (SELECT 1 FROM cust_order o WHERE c.cust_nbr = o.cust_nbr);







Here is the execution plan for the EXISTS version of the queries:





Query Plan

---------------------------------------

SELECT STATEMENT Cost = 320

FILTER

TABLE ACCESS FULL CUSTOMER

INDEX RANGE SCAN IND_ORD_CUST_NBR







Notice that the second query eliminates the overhead of the sort

operation, and therefore runs faster.









15.1.2 WHERE Versus HAVING





We discussed the GROUP BY and HAVING clauses in

Chapter 4. Sometimes, when writing a GROUP BY

query, you have a condition that you can specify in either the WHERE

or HAVING clause. In situations where you have a choice,

you'll always get better performance if you specify

the condition in the WHERE clause. The reason is that

it's less expensive to eliminate rows before they

are summarized than it is to eliminate results after summarization.





Let's look at an example illustrating the advantage

of WHERE over HAVING. Here's a query with the HAVING

clause that reports the number of orders in the year 2000:





SELECT year, COUNT(*)

FROM orders

GROUP BY year

HAVING year = 2001;



YEAR COUNT(*)

---------- ----------

2001 1440







The execution plan for this query is as follows:





Query Plan

-------------------------------------------

SELECT STATEMENT Cost = 6

FILTER

SORT GROUP BY

INDEX FAST FULL SCAN ORDERS_PK







Now, look at that same query, but with the year restriction in the

WHERE clause:





SELECT year, COUNT(*)

FROM orders

WHERE year = 2001

GROUP BY year;



YEAR COUNT(*)

--------- ----------

2001 1440







The execution plan for this version of the query is:





Query Plan

-------------------------------------

SELECT STATEMENT Cost = 2

SORT GROUP BY NOSORT

INDEX FAST FULL SCAN ORDERS_PK







With the HAVING clause, the query performs the group operation first,

and then filters the groups for the condition specified. The WHERE

clause

version of the query filters the rows before

performing the group operation. The result of filtering with the

WHERE clause is that there are fewer rows to summarize, and

consequently, the query performs better.





However, you should note that not all types of filtering can be

achieved using the WHERE clause. Sometimes, you may need to summarize

the data first and then filter the summarized data based on the

summarized values. In such situations, you have to filter using the

HAVING clause, because only the HAVING clause can

"see" summarized values. Moreover,

there are situations when you may need to use the WHERE clause and

the HAVING clause together in a query to filter the results the way

you want. For details, see Chapter 4.









15.1.3 UNION Versus UNION ALL





We discussed UNION and UNION ALL in Chapter 7. UNION ALL combines the results of two

SELECT statements. UNION combines the results of two SELECT

statements, and then returns only distinct rows from the combination;

duplicates are eliminated. It is, therefore, obvious that to remove

the duplicates, UNION performs one extra step than UNION ALL. This

extra step is a sort, which is costly in terms of performance.

Therefore, whenever your application can handle duplicates or you are

certain that no duplicates will result, consider using UNION ALL

instead of UNION.





Let's look an example to understand this issue

better. The following query uses UNION to return a list of orders

where the sale price exceeds $50.00 or where the customer is located

in region 5:





SELECT order_nbr, cust_nbr 

FROM cust_order

WHERE sale_price > 50

UNION

SELECT order_nbr, cust_nbr

FROM cust_order

WHERE cust_nbr IN

(SELECT cust_nbr FROM customer WHERE region_id = 5);



ORDER_NBR CUST_NBR

---------- ----------

1000 1

1001 1

1002 5

1003 4

1004 4

1005 8

1006 1

1007 5

1008 5

1009 1

1011 1

1012 1

1015 5

1017 4

1019 4

1021 8

1023 1

1025 5

1027 5

1029 1



20 rows selected.







The execution plan for this UNION query is:





Query Plan

---------------------------------------------------------------------------

SELECT STATEMENT Cost = 8

SORT UNIQUE

UNION-ALL

TABLE ACCESS FULL CUST_ORDER

HASH JOIN

TABLE ACCESS FULL CUSTOMER

TABLE ACCESS FULL CUST_ORDER







The following query uses UNION ALL instead of UNION to get the same

information:





SELECT order_nbr, cust_nbr 

FROM cust_order

WHERE sale_price > 50

UNION ALL

SELECT order_nbr, cust_nbr

FROM cust_order

WHERE cust_nbr IN

(SELECT cust_nbr FROM customer WHERE region_id = 5);



ORDER_NBR CUST_NBR

---------- ----------

1001 1

1003 4

1005 8

1009 1

1012 1

1017 4

1021 8

1029 1

1001 1

1000 1

1002 5

1003 4

1004 4

1006 1

1007 5

1008 5

1009 1

1012 1

1011 1

1015 5

1017 4

1019 4

1023 1

1025 5

1027 5

1029 1



26 rows selected.







Note the duplicate rows in the output. However, note also that UNION

ALL performs better than UNION, as you can see from the following

execution plan:





Query Plan

---------------------------------------------------------------------------

SELECT STATEMENT Cost = 4

UNION-ALL

TABLE ACCESS FULL CUST_ORDER

HASH JOIN

TABLE ACCESS FULL CUSTOMER

TABLE ACCESS FULL CUST_ORDER







You can see that the extra operation (SORT UNIQUE) in the UNION makes

it run slower than UNION ALL.









15.1.4 LEFT Versus RIGHT OUTER JOIN





As you have seen in Chapter 3, outer joins can

be of type LEFT, RIGHT, or FULL. LEFT and RIGHT are really two ways

of looking at the same operation. Mixing LEFT and









RIGHT

outer joins in the same application can cause confusion, as you and

other programmers must constantly shift your point-of-view from one

approach to the other. Use both LEFT and RIGHT outer joins in the

same query, and you'll find

your confusion greatly magnified. For example:





SELECT e.lname, j.function, d.name

FROM job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id

RIGHT OUTER JOIN department d ON e.dept_id = d.dept_id;



LNAME FUNCTION NAME

-------------------- ------------------------------ ------------

MILLER CLERK ACCOUNTING

CLARK MANAGER ACCOUNTING

KING PRESIDENT ACCOUNTING

SMITH CLERK RESEARCH

FORD ANALYST RESEARCH

JONES MANAGER RESEARCH

SCOTT ANALYST RESEARCH

JAMES CLERK SALES

BLAKE MANAGER SALES

MARTIN SALESPERSON SALES

TURNER SALESPERSON SALES

ALLEN SALESPERSON SALES

OPERATIONS







Such confusion is unnecessary. Since both LEFT and RIGHT outer joins

represent the same operation, but from differing points of view, you

can simply pick one point of view and use it consistently. For

example, many programmers write all outer joins as either FULL or

LEFT, ignoring RIGHT.





The preceding query uses a LEFT and then a RIGHT outer join to do the

following:





  1. Connect an outer join from employee to

    job, with employee as the

    required table

  2. Connect another outer join from department to the

    results from Step 1, with department as the

    required table



Using parentheses to explicitly state the above order of operations,

you can rewrite the query using all LEFT outer joins, as follows:





SELECT e.lname, j.function, d.name

FROM department d LEFT OUTER JOIN

(job j LEFT OUTER JOIN employee e

ON e.job_id = j.job_id)

ON e.dept_id = d.dept_id;



LNAME FUNCTION NAME

-------------------- ------------------------------ -------------

MILLER CLERK ACCOUNTING

CLARK MANAGER ACCOUNTING

KING PRESIDENT ACCOUNTING

SMITH CLERK RESEARCH

FORD ANALYST RESEARCH

JONES MANAGER RESEARCH

SCOTT ANALYST RESEARCH

JAMES CLERK SALES

BLAKE MANAGER SALES

MARTIN SALESPERSON SALES

TURNER SALESPERSON SALES

ALLEN SALESPERSON SALES

OPERATIONS







The tradeoff here is between using parentheses and mixing RIGHT and

LEFT outer joins. This second version of the query still joins

employee to job, and then joins

department to that result. The operations are

exactly the same as in the previous version. This time, the

parentheses make the order of operations clearer, and we personally

find the

second version of the query a bit

easier to understand.



















     < Day Day Up > 



    No comments: