8.4 Complex Hierarchy Operations
Using Oracle's hierarchical SQL extensions, you can
perform complex, hierarchical queries much more easily than you would
be able to do using standard, ANSI SQL.
8.4.1 Finding the Number of Levels
Previously we showed how the LEVEL pseudocolumn
generates a level number for each record when we use the START WITH .
. . CONNECT BY clause. You can use the following query to determine
the number of levels in the hierarchy by finding the maximum level
number returned by the LEVEL pseudocolumn:
SELECT MAX(LEVEL)
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY PRIOR emp_id = manager_emp_id;
MAX(LEVEL)
----------
4
To determine the number of employees at each level, group the results
by LEVEL and count the number of employees in each distinct group.
For example:
SELECT LEVEL, COUNT(emp_id)
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY PRIOR emp_id = manager_emp_id
GROUP BY LEVEL;
LEVEL COUNT(EMP_ID)
--------- -------------
1 1
2 3
3 8
4 2
8.4.2 Listing Records in Hierarchical Order
One of the very common programming challenges SQL
programmers face is to list records in a hierarchy in their proper
hierarchical order. For example, you might wish to list employees
with their subordinates underneath them, as in the following query:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || lname "Employee",
emp_id, manager_emp_id
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY PRIOR emp_id = manager_emp_id;
LEVEL Employee EMP_ID MANAGER_EMP_ID
--------- ------------ --------- --------------
1 KING 7839
2 JONES 7566 7839
3 SCOTT 7788 7566
4 ADAMS 7876 7788
3 FORD 7902 7566
4 SMITH 7369 7902
2 BLAKE 7698 7839
3 ALLEN 7499 7698
3 WARD 7521 7698
3 MARTIN 7654 7698
3 TURNER 7844 7698
3 JAMES 7900 7698
2 CLARK 7782 7839
3 MILLER 7934 7782
14 rows selected.
Notice that by using the expression LPAD('
',2*(LEVEL - 1)), we are able to align
employee names in a manner that corresponds to their level. As the
level number increases, the number of spaces returned by the
expression increases, and the employee name is further indented.
The previous query lists all the employees in the
employee table. If you want to filter out certain
employees based on some condition, then you can use
a WHERE clause in your hierarchical
query. Here is an example:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || lname "Employee",
emp_id, manager_emp_id, salary
FROM employee
WHERE salary > 2000
START WITH manager_emp_id IS NULL
CONNECT BY manager_emp_id = PRIOR emp_id;
LEVEL Employee EMP_ID MANAGER_EMP_ID SALARY
--------- ------------ --------- -------------- ---------
1 KING 7839 5000
3 SCOTT 7788 7566 3000
3 FORD 7902 7566 3000
2 BLAKE 7698 7839 2850
2 CLARK 7782 7839 2450
This query lists records with salary > 2000.
The WHERE clause restricts the rows returned by the query without
affecting other rows in the hierarchy. In our example, the WHERE
condition filtered JONES out of the result, but the employees below
JONES in the hierarchy (SCOTT and FORD) are not filtered out, and are
still indented as they were when JONES was present. The WHERE clause
must come before the START WITH . . . CONNECT BY clause in a
hierarchical query; otherwise, you'll get a syntax
error.
|
Though the WHERE clause comes before the START WITH . . . CONNECT BY
construct, the filtering happens after the complete hierarchy tree is
built.
|
|
As discussed earlier, the START WITH clause is optional�i.e.,
you can have a CONNECT BY without a START WITH. When the START WITH
clause is missing, effectively the query doesn't
specify where to start building the hierarchy. In that situation,
each row of the table is considered a root, and a hierarchy is built
for each row. For example:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || lname "Employee",
emp_id, manager_emp_id, salary
FROM employee
CONNECT BY manager_emp_id = PRIOR emp_id;
LEVEL Employee EMP_ID MANAGER_EMP_ID SALARY
----- -------------------- ---------- -------------- ----------
1 SCOTT 7788 7566 3000
2 ADAMS 7876 7788 1100
1 FORD 7902 7566 3000
2 SMITH 7369 7902 800
1 ALLEN 7499 7698 1600
1 WARD 7521 7698 1250
1 JAMES 7900 7698 950
1 TURNER 7844 7698 1500
1 MARTIN 7654 7698 1250
1 MILLER 7934 7782 1300
1 ADAMS 7876 7788 1100
1 JONES 7566 7839 2000
2 SCOTT 7788 7566 3000
3 ADAMS 7876 7788 1100
2 FORD 7902 7566 3000
3 SMITH 7369 7902 800
1 CLARK 7782 7839 2450
2 MILLER 7934 7782 1300
1 BLAKE 7698 7839 2850
2 ALLEN 7499 7698 1600
2 WARD 7521 7698 1250
2 JAMES 7900 7698 950
2 TURNER 7844 7698 1500
2 MARTIN 7654 7698 1250
1 SMITH 7369 7902 800
1 KING 7839 5000
2 JONES 7566 7839 2000
3 SCOTT 7788 7566 3000
4 ADAMS 7876 7788 1100
3 FORD 7902 7566 3000
4 SMITH 7369 7902 800
2 CLARK 7782 7839 2450
3 MILLER 7934 7782 1300
2 BLAKE 7698 7839 2850
3 ALLEN 7499 7698 1600
3 WARD 7521 7698 1250
3 JAMES 7900 7698 950
3 TURNER 7844 7698 1500
3 MARTIN 7654 7698 1250
39 rows selected.
This example returns the hierarchy tree for each row in the table. In
the organization tree under KING, SCOTT is at level 3; however, in
the organization tree under JONES, SCOTT is at level 2, and under the
organization tree headed by himself, SCOTT is at level 1.
8.4.3 Checking for Ascendancy
Another common
operation on hierarchical data is to
check for ascendancy. In an organization chart, you may ask whether
one employee has authority over another. For example:
"Does JONES have any authority over
BLAKE?" To find out, you need to search for BLAKE in
the subtree headed by JONES. If you find BLAKE in the subtree, then
you know that BLAKE either directly or indirectly reports to JONES.
If you don't find BLAKE in the subtree, then you
know that JONES doesn't have any authority over
BLAKE. The following query searches for BLAKE in the subtree headed
by JONES:
SELECT *
FROM employee
WHERE lname = 'BLAKE'
START WITH lname = 'JONES'
CONNECT BY manager_emp_id = PRIOR emp_id;
no rows selected
The START WITH . . . CONNECT BY clause in this example generates the
subtree headed by JONES, and the WHERE clause filters this subtree to
find BLAKE. As you can see, no rows are returned. This means that
BLAKE was not found in JONES's subtree, so you know
that JONES has no authority over BLAKE. Let's take a
look at another example that produces positive results. This time
we'll check to see whether JONES has any authority
over SMITH:
SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date
FROM employee
WHERE lname = 'SMITH'
START WITH lname = 'JONES'
CONNECT BY manager_emp_id = PRIOR emp_id;
EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE
---------- ---------- ---------- -------------- ---------- ---------
7369 SMITH 20 7902 800 17-DEC-80
This time, SMITH was found in the list of employees in
JONES's subtree, so you know that at some level
JONES has management authority over SMITH.
8.4.4 Deleting a Subtree
Let's assume
that
the organization we are dealing with splits, and JONES and all his
subordinates form a new company. Therefore, we don't
need to maintain JONES and his subordinates in our
employee table. Furthermore, we need to delete the
entire subtree headed by JONES, as shown in Figure 8-1, from our
table. We can do this by using a subquery as in the following
example:
DELETE FROM employee
WHERE emp_id IN
(SELECT emp_id FROM employee
START WITH lname = 'JONES'
CONNECT BY manager_emp_id = PRIOR emp_id);
5 rows deleted.
In this example, the subquery generates the subtree headed by JONES,
and returns the emp_ids of the employees in that
subtree, including JONES's. The outer query then
deletes the records with these emp_id values from
the employee table.
8.4.5 Listing Multiple Root Nodes
An interesting
variation on the problem of
listing the root node of a hierarchy is to find and list the root
nodes from several hierarchies that are all stored in the same table.
For example, you might consider department managers to represent root
nodes, and you might further wish to list all department managers
found in the employee table.
There are no constraints on the employees belonging to any
department. However, you can assume that if A reports to B and B
reports to C, and A and C belong to the same department, then B also
belongs to the same department. If an employee's
manager belongs to another department, then that employee is the
uppermost employee, or manager, of his department.
Therefore, to find the uppermost employee in each department, you
need to search the tree for those employees whose managers belong to
a different department than their own. You can do that using the
following query:
SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY manager_emp_id = PRIOR emp_id
AND dept_id != PRIOR dept_id;
EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE
------ -------- -------- -------------- ------ ---------
7839 KING 10 5000 17-NOV-81
7566 JONES 20 7839 2000 02-APR-81
7698 BLAKE 30 7839 2850 01-MAY-80
In this example, the extra condition (dept_id != PRIOR
dept_id) added to the CONNECT BY clause restricts the
output to only those employees whose managers belong to a different
department than their own.
8.4.6 Listing the Top Few Levels of a Hierarchy
Another common task in
dealing with hierarchical data is listing the top few levels of a
hierarchy tree. For example, you may want to list top management
employees in an organization. Let's assume that the
top two levels in our organization chart constitute top management.
You can then use the LEVEL pseudocolumn to identify those employees,
as in the following example:
SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date
FROM employee
WHERE LEVEL <= 2
START WITH manager_emp_id IS NULL
CONNECT BY manager_emp_id = PRIOR emp_id;
EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE
------ --------- ---------- -------------- ------ ---------
7839 KING 10 5000 17-NOV-81
7566 JONES 20 7839 2000 02-APR-81
7698 BLAKE 30 7839 2850 01-MAY-80
7782 CLARK 10 7839 2450 09-JUN-81
In this example, the LEVEL <= 2 condition in
the WHERE clause restricts the results to only those employees in the
top two levels of the organization chart.
8.4.7 Aggregating a Hierarchy
Another challenging
requirement is to aggregate a
hierarchy. For example, you may want to sum the salaries of all
employees reporting to a specific employee. Or, you may want to
consider each employee as a root, and for each employee print out the
sum of the salaries of all subordinate employees.
The first problem is relatively simple. Earlier we described how to
select a subtree headed by an employee. You can easily sum the
salaries of all employees in such a subtree. For example:
SELECT SUM(salary)
FROM employee
START WITH lname = 'JONES'
CONNECT BY manager_emp_id = PRIOR emp_id;
SUM(SALARY)
-----------
9900
The START WITH lname = 'JONES' clause generates
the subtree headed by JONES, and the SUM(salary)
expression sums the salary of employees in this subtree.
The second problem, a seemingly simple extension of the first, is
relatively complex. You want to consider each employee as a root, and
for each employee you want to sum the salaries of all employees in
its subtree. In essence, you want to repeat the previous query for
each employee in the table. The following SQL uses an inline view to
achieve this:
SELECT t2.lname, t2.salary,
(SELECT SUM(t1.salary) FROM employee t1
START WITH t1.lname = t2.lname
CONNECT BY t1.manager_emp_id = PRIOR t1.emp_id) sum_salary
FROM employee t2;
LNAME SALARY SUM_SALARY
-------------------- ---------- ----------
SMITH 800 800
ALLEN 1600 1600
WARD 1250 1250
JONES 2000 9900
MARTIN 1250 1250
BLAKE 2850 9400
CLARK 2450 3750
SCOTT 3000 4100
KING 5000 28050
TURNER 1500 1500
ADAMS 1100 1100
JAMES 950 950
FORD 3000 3800
MILLER 1300 1300
14 rows selected.
In this example, the START WITH . . . CONNECT BY
clause in the inline view generates a subtree for each employee. The
inline view executes once for every row in the outer
employee table. For each row in the outer
employee table, the inline view generates a
subtree headed by this employee, and returns the sum of salaries for
all the employees in this subtree to the main query.
The result set provides two numbers for each employee. The first
number, salary, is the employee's
own salary. The second number, sum_salary, is the
sum of the salaries of all employees under him (including
himself/herself). Often programmers resort to PL/SQL to solve this
type of problem. However, this query, which combines the power of
hierarchical queries with that of inline views, solves the problem in
a much more concise and elegant way.
8.4.8 Ordering Hierarchical Data
Sorting the results from
a
hierarchical query is a more interesting problem than it first may
sound. A hierarchical query with a START WITH . . . CONNECT BY . . .
construct displays the results in an arbitrary order, as shown in the
following example:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || lname "EMPLOYEE",
emp_id, manager_emp_id
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY PRIOR emp_id = manager_emp_id;
LEVEL EMPLOYEE EMP_ID MANAGER_EMP_ID
---------- -------------------- ---------- --------------
1 KING 7839
2 JONES 7566 7839
3 SCOTT 7788 7566
4 ADAMS 7876 7788
3 FORD 7902 7566
4 SMITH 7369 7902
2 BLAKE 7698 7839
3 ALLEN 7499 7698
3 WARD 7521 7698
3 MARTIN 7654 7698
3 TURNER 7844 7698
3 JAMES 7900 7698
2 CLARK 7782 7839
3 MILLER 7934 7782
As always, you can use an ORDER BY clause to order the result rows in
the way you want. However, in the case of a hierarchical query, an
ORDER BY clause can destroy the hierarchical nature of the data
returned by the query. This is shown in the following example, which
orders the results by last name:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || lname "EMPLOYEE",
emp_id, manager_emp_id
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY PRIOR emp_id = manager_emp_id
ORDER BY lname;
LEVEL EMPLOYEE EMP_ID MANAGER_EMP_ID
---------- -------------------- ---------- --------------
4 ADAMS 7876 7788
3 ALLEN 7499 7698
2 BLAKE 7698 7839
2 CLARK 7782 7839
3 FORD 7902 7566
3 JAMES 7900 7698
2 JONES 7566 7839
1 KING 7839
3 MARTIN 7654 7698
3 MILLER 7934 7782
3 SCOTT 7788 7566
4 SMITH 7369 7902
3 TURNER 7844 7698
3 WARD 7521 7698
As you can see from this output, it is impossible to identify the
hierarchical relationship between the rows. To resolve this problem,
you can use the SIBLINGS (in Oracle9i and later)
keyword in the ORDER BY clause, to order the hierarchical data while
at the same time preserving the hierarchy. Oracle does this by
sorting at each level while ensuring that child nodes remain
underneath their parents. For example:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || lname "EMPLOYEE",
emp_id, manager_emp_id
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY PRIOR emp_id = manager_emp_id
ORDER SIBLINGS BY lname;
LEVEL EMPLOYEE EMP_ID MANAGER_EMP_ID
---------- -------------------- ---------- --------------
1 KING 7839
2 BLAKE 7698 7839
3 ALLEN 7499 7698
3 JAMES 7900 7698
3 MARTIN 7654 7698
3 TURNER 7844 7698
3 WARD 7521 7698
2 CLARK 7782 7839
3 MILLER 7934 7782
2 JONES 7566 7839
3 FORD 7902 7566
4 SMITH 7369 7902
3 SCOTT 7788 7566
4 ADAMS 7876 7788
In this example's output, BLAKE, CLARK, and JONES
are siblings, and they are displayed in ascending order. So are
BLAKE's children: ALLEN, JAMES, MARTIN, TURNER, and
WARD.
8.4.9 Finding the Path to a Node
You can list the entire path of
a
given node starting from the root node
using the
SYS_CONNECT_BY_PATH
function (in Oracle9i and later). This function
takes two arguments: a column name and a character string. The
function then returns a list containing each value of the column from
the root node to the current node, separating values by the character
string you provide. For example:
SELECT SYS_CONNECT_BY_PATH(lname, '#')
FROM employee
START WITH manager_emp_id IS NULL
CONNECT BY PRIOR emp_id = manager_emp_id;
SYS_CONNECT_BY_PATH(LNAME,'#')
---------------------------------
#KING
#KING#JONES
#KING#JONES#SCOTT
#KING#JONES#SCOTT#ADAMS
#KING#JONES#FORD
#KING#JONES#FORD#SMITH
#KING#BLAKE
#KING#BLAKE#ALLEN
#KING#BLAKE#WARD
#KING#BLAKE#MARTIN
#KING#BLAKE#TURNER
#KING#BLAKE#JAMES
#KING#CLARK
#KING#CLARK#MILLER
The preceding query lists the full organizational path for each
employee starting at the top. For example,
#KING#JONES#FORD#SMITH shows the complete
reporting relation of SMITH in the organization.
To understand the usefulness of the SYS_CONNECT_BY_PATH function,
think of a trail in a park. The branches of such a trail are
illustrated in Figure 8-3.
The various points in the trail, and the distance between them is
stored in a table, trail:
CREATE TABLE trail (
start_point CHAR,
end_point CHAR,
distance NUMBER
);
INSERT INTO trail VALUES ('A','B', 3);
INSERT INTO trail VALUES ('A','C', 2.5);
INSERT INTO trail VALUES ('B','D', 2);
INSERT INTO trail VALUES ('B','E', 1.5);
INSERT INTO trail VALUES ('C','F', 2.5);
INSERT INTO trail VALUES ('C','G', 2.5);
INSERT INTO trail VALUES ('G','H', 3.5);
INSERT INTO trail VALUES ('E','J', 1.5);
COMMIT;
You need to find the total distance of each point in the trail from
the starting point "A." The
following query uses SYS_CONNECT_BY_PATH to print the distance of
each point concatenated with the distances of each of its ancestors
in the tree:
SELECT end_point,
SUBSTR(SYS_CONNECT_BY_PATH(distance,'+'),2) total_distance
FROM trail
START WITH start_point = 'A'
CONNECT BY start_point = PRIOR end_point;
E TOTAL_DISTANCE
- --------------------
B 3
D 3+2
E 3+1.5
J 3+1.5+1.5
C 2.5
F 2.5+2.5
G 2.5+2.5
H 2.5+2.5+3.5
The
SUBSTR
function takes out the first "+" in
the query's output. Now, each of the
total_distance expressions, one for each point,
can be evaluated to compute the total distance. One way to evaluate
such expressions is to write an eval function, as
shown in the following code:
CREATE OR REPLACE FUNCTION eval (exp IN VARCHAR2) RETURN NUMBER IS
result NUMBER;
BEGIN
EXECUTE IMMEDIATE 'SELECT ' || exp || ' FROM DUAL' INTO result;
RETURN result;
EXCEPTION
WHEN OTHERS THEN
RETURN NULL;
END;
/
The following example uses the eval function to
compute the total distance of each point in the trail from the
starting point A:
SELECT end_point,
eval(SUBSTR(SYS_CONNECT_BY_PATH(distance,'+'),2)) total_distance
FROM trail
START WITH start_point = 'A'
CONNECT BY start_point = PRIOR end_point;
E TOTAL_DISTANCE
- --------------
B 3
D 5
E 4.5
J 6
C 2.5
F 5
G 5
H 8.5
From this output, it is easy to figure out how far each point is in
the trail from the starting point
"A."
|
No comments:
Post a Comment