Programmer's Life: 15.1 Know When to Use Specific Constructs

< Day Day Up >

15.1 Know When to Use Specific Constructs

Depending on the circumstances, certain SQL constructs are

preferable to others. For example, use of the EXISTS predicate is

often preferable to DISTINCT. The next sections discuss the usage of

such constructs.

15.1.1 EXISTS Is Preferable to DISTINCT

The

DISTINCT keyword

used in a SELECT clause eliminates duplicate rows in the result set.

To eliminate those duplicates, Oracle performs a sort, and that sort

requires time and disk space. Therefore, avoid using DISTINCT if you

can tolerate having duplicate rows returned by a query. If you

can't tolerate the duplicate rows, or your

application can't handle them, use EXISTS in place

of DISTINCT.

For example, assume you are trying to find the names of customers who

have orders. Your query has to be based on two tables:

customer and cust_order. Using

DISTINCT, your query would be written as follows:

SELECT DISTINCT c.cust_nbr, c.name

FROM customer c JOIN cust_order o

ON c.cust_nbr = o.cust_nbr;

The corresponding execution plan for this query is as follows. Note

the SORT operation, which is a result of DISTINCT being used.

Query Plan

-----------------------------------------

SELECT STATEMENT   Cost = 3056

  SORT UNIQUE

    MERGE JOIN

      INDEX FULL SCAN IND_ORD_CUST_NBR

      SORT JOIN

        TABLE ACCESS FULL CUSTOMER

To use EXISTS, the query needs to be rewritten as follows:

SELECT c.cust_nbr, c.name

FROM customer c

WHERE EXISTS (SELECT 1 FROM cust_order o WHERE c.cust_nbr = o.cust_nbr);

Here is the execution plan for the EXISTS version of the queries:

Query Plan

---------------------------------------

SELECT STATEMENT   Cost = 320

  FILTER

    TABLE ACCESS FULL CUSTOMER

    INDEX RANGE SCAN IND_ORD_CUST_NBR

Notice that the second query eliminates the overhead of the sort

operation, and therefore runs faster.

15.1.2 WHERE Versus HAVING

We discussed the GROUP BY and HAVING clauses in

Chapter 4. Sometimes, when writing a GROUP BY

query, you have a condition that you can specify in either the WHERE

or HAVING clause. In situations where you have a choice,

you'll always get better performance if you specify

the condition in the WHERE clause. The reason is that

it's less expensive to eliminate rows before they

are summarized than it is to eliminate results after summarization.

Let's look at an example illustrating the advantage

of WHERE over HAVING. Here's a query with the HAVING

clause that reports the number of orders in the year 2000:

SELECT year, COUNT(*)

FROM orders

GROUP BY year

HAVING year = 2001;



      YEAR   COUNT(*)

---------- ----------

      2001       1440

The execution plan for this query is as follows:

Query Plan

-------------------------------------------

SELECT STATEMENT   Cost = 6

  FILTER

    SORT GROUP BY

      INDEX FAST FULL SCAN ORDERS_PK

Now, look at that same query, but with the year restriction in the

WHERE clause:

SELECT year, COUNT(*)

FROM orders

WHERE year = 2001

GROUP BY year;



     YEAR   COUNT(*)

--------- ----------

     2001       1440

The execution plan for this version of the query is:

Query Plan

-------------------------------------

SELECT STATEMENT   Cost = 2

  SORT GROUP BY NOSORT

    INDEX FAST FULL SCAN ORDERS_PK

With the HAVING clause, the query performs the group operation first,

and then filters the groups for the condition specified. The WHERE

clause

version of the query filters the rows before

performing the group operation. The result of filtering with the

WHERE clause is that there are fewer rows to summarize, and

consequently, the query performs better.

However, you should note that not all types of filtering can be

achieved using the WHERE clause. Sometimes, you may need to summarize

the data first and then filter the summarized data based on the

summarized values. In such situations, you have to filter using the

HAVING clause, because only the HAVING clause can

"see" summarized values. Moreover,

there are situations when you may need to use the WHERE clause and

the HAVING clause together in a query to filter the results the way

you want. For details, see Chapter 4.

15.1.3 UNION Versus UNION ALL

We discussed UNION and UNION ALL in Chapter 7. UNION ALL combines the results of two

SELECT statements. UNION combines the results of two SELECT

statements, and then returns only distinct rows from the combination;

duplicates are eliminated. It is, therefore, obvious that to remove

the duplicates, UNION performs one extra step than UNION ALL. This

extra step is a sort, which is costly in terms of performance.

Therefore, whenever your application can handle duplicates or you are

certain that no duplicates will result, consider using UNION ALL

instead of UNION.

Let's look an example to understand this issue

better. The following query uses UNION to return a list of orders

where the sale price exceeds $50.00 or where the customer is located

in region 5:

SELECT order_nbr, cust_nbr 

FROM cust_order 

WHERE sale_price > 50

UNION

SELECT order_nbr, cust_nbr 

FROM cust_order

WHERE cust_nbr IN 

(SELECT cust_nbr FROM customer WHERE region_id = 5);



 ORDER_NBR   CUST_NBR

---------- ----------

      1000          1

      1001          1

      1002          5

      1003          4

      1004          4

      1005          8

      1006          1

      1007          5

      1008          5

      1009          1

      1011          1

      1012          1

      1015          5

      1017          4

      1019          4

      1021          8

      1023          1

      1025          5

      1027          5

      1029          1



20 rows selected.

The execution plan for this UNION query is:

Query Plan

---------------------------------------------------------------------------

SELECT STATEMENT   Cost = 8

  SORT UNIQUE

    UNION-ALL

      TABLE ACCESS FULL CUST_ORDER

      HASH JOIN

        TABLE ACCESS FULL CUSTOMER

        TABLE ACCESS FULL CUST_ORDER

The following query uses UNION ALL instead of UNION to get the same

information:

SELECT order_nbr, cust_nbr 

FROM cust_order 

WHERE sale_price > 50

UNION ALL

SELECT order_nbr, cust_nbr 

FROM cust_order

WHERE cust_nbr IN 

(SELECT cust_nbr FROM customer WHERE region_id = 5);



 ORDER_NBR   CUST_NBR

---------- ----------

      1001          1

      1003          4

      1005          8

      1009          1

      1012          1

      1017          4

      1021          8

      1029          1

      1001          1

      1000          1

      1002          5

      1003          4

      1004          4

      1006          1

      1007          5

      1008          5

      1009          1

      1012          1

      1011          1

      1015          5

      1017          4

      1019          4

      1023          1

      1025          5

      1027          5

      1029          1



26 rows selected.

Note the duplicate rows in the output. However, note also that UNION

ALL performs better than UNION, as you can see from the following

execution plan:

Query Plan

---------------------------------------------------------------------------

SELECT STATEMENT   Cost = 4

  UNION-ALL

    TABLE ACCESS FULL CUST_ORDER

    HASH JOIN

      TABLE ACCESS FULL CUSTOMER

      TABLE ACCESS FULL CUST_ORDER

You can see that the extra operation (SORT UNIQUE) in the UNION makes

it run slower than UNION ALL.

15.1.4 LEFT Versus RIGHT OUTER JOIN

As you have seen in Chapter 3, outer joins can

be of type LEFT, RIGHT, or FULL. LEFT and RIGHT are really two ways

of looking at the same operation. Mixing LEFT and

RIGHT

outer joins in the same application can cause confusion, as you and

other programmers must constantly shift your point-of-view from one

approach to the other. Use both LEFT and RIGHT outer joins in the

same query, and you'll find

your confusion greatly magnified. For example:

SELECT e.lname, j.function, d.name

FROM job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id

           RIGHT OUTER JOIN department d ON e.dept_id = d.dept_id;



LNAME                FUNCTION                       NAME

-------------------- ------------------------------ ------------

MILLER               CLERK                          ACCOUNTING

CLARK                MANAGER                        ACCOUNTING

KING                 PRESIDENT                      ACCOUNTING

SMITH                CLERK                          RESEARCH

FORD                 ANALYST                        RESEARCH

JONES                MANAGER                        RESEARCH

SCOTT                ANALYST                        RESEARCH

JAMES                CLERK                          SALES

BLAKE                MANAGER                        SALES

MARTIN               SALESPERSON                    SALES

TURNER               SALESPERSON                    SALES

ALLEN                SALESPERSON                    SALES

                                                    OPERATIONS

Such confusion is unnecessary. Since both LEFT and RIGHT outer joins

represent the same operation, but from differing points of view, you

can simply pick one point of view and use it consistently. For

example, many programmers write all outer joins as either FULL or

LEFT, ignoring RIGHT.

The preceding query uses a LEFT and then a RIGHT outer join to do the

following:

Connect an outer join from employee to

job, with employee as the

required table
Connect another outer join from department to the

results from Step 1, with department as the

required table

Using parentheses to explicitly state the above order of operations,

you can rewrite the query using all LEFT outer joins, as follows:

SELECT e.lname, j.function, d.name

FROM department d LEFT OUTER JOIN

        (job j LEFT OUTER JOIN employee e

         ON e.job_id = j.job_id)

     ON e.dept_id = d.dept_id;



LNAME                FUNCTION                       NAME

-------------------- ------------------------------ -------------

MILLER               CLERK                          ACCOUNTING

CLARK                MANAGER                        ACCOUNTING

KING                 PRESIDENT                      ACCOUNTING

SMITH                CLERK                          RESEARCH

FORD                 ANALYST                        RESEARCH

JONES                MANAGER                        RESEARCH

SCOTT                ANALYST                        RESEARCH

JAMES                CLERK                          SALES

BLAKE                MANAGER                        SALES

MARTIN               SALESPERSON                    SALES

TURNER               SALESPERSON                    SALES

ALLEN                SALESPERSON                    SALES

                                                    OPERATIONS

The tradeoff here is between using parentheses and mixing RIGHT and

LEFT outer joins. This second version of the query still joins

employee to job, and then joins

department to that result. The operations are

exactly the same as in the previous version. This time, the

parentheses make the order of operations clearer, and we personally

find the

second version of the query a bit

easier to understand.

< Day Day Up >

Programmer's Life

Wednesday, January 20, 2010

15.1 Know When to Use Specific Constructs

15.1 Know When to Use Specific Constructs

15.1.1 EXISTS Is Preferable to DISTINCT

15.1.2 WHERE Versus HAVING

15.1.3 UNION Versus UNION ALL

15.1.4 LEFT Versus RIGHT OUTER JOIN

No comments:

Blog Archive

About Me

Link