Wednesday, November 4, 2009

12.7 Finding Rows Containing Per-Group Minimum or Maximum Values




I l@ve RuBoard










12.7 Finding Rows Containing Per-Group Minimum or Maximum Values




12.7.1 Problem



You
want to find which record within each group of rows in a table
contains the maximum or minimum value for a given column.
For
example, you want to determine the most expensive painting in your
collection for each artist.





12.7.2 Solution



Create a temporary table to hold the per-group maximum or minimum,
then join the temporary table with the original one to pull out the
matching record for each group.





12.7.3 Discussion



Many questions
involve finding largest or smallest values in a particular table
column, but it's also common to want to know what
the other values are in the row that contains the value. For example,
you can use MAX(pop) to find the largest state
population recorded in the states table, but you
might also want to know which state has that population. As shown in
Recipe 7.6, one way to solve this problem is to use
a SQL variable. The technique works like this:



mysql> SELECT @max := MAX(pop) FROM states;
mysql> SELECT * FROM states WHERE pop = @max;
+------------+--------+------------+----------+
| name | abbrev | statehood | pop |
+------------+--------+------------+----------+
| California | CA | 1850-09-09 | 29760021 |
+------------+--------+------------+----------+


Another way to answer the question is to use a join. First, select
the maximum population value into a temporary table:



mysql> CREATE TABLE tmp SELECT MAX(pop) as maxpop FROM states;


Then join the temporary table to the original one to find the record
matching the selected population:



mysql> SELECT states.* FROM states, tmp WHERE states.pop = tmp.maxpop;
+------------+--------+------------+----------+
| name | abbrev | statehood | pop |
+------------+--------+------------+----------+
| California | CA | 1850-09-09 | 29760021 |
+------------+--------+------------+----------+


By applying these techniques to the artist and
painting tables, you can answer questions like
"What is the most expensive painting in the
collection, and who painted it?" To use a SQL
variable, store the highest price in it, then use the variable to
identify the record containing the price so you can retrieve other
columns from it:



mysql> SELECT @max_price := MAX(price) FROM painting;
mysql> SELECT artist.name, painting.title, painting.price
-> FROM artist, painting
-> WHERE painting.price = @max_price
-> AND painting.a_id = artist.a_id;
+----------+---------------+-------+
| name | title | price |
+----------+---------------+-------+
| Da Vinci | The Mona Lisa | 87 |
+----------+---------------+-------+


The same thing can be done by creating a
temporary table to hold the maximum
price, and then joining it with the other tables:



mysql> CREATE TABLE tmp SELECT MAX(price) AS max_price FROM painting;
mysql> SELECT artist.name, painting.title, painting.price
-> FROM artist, painting, tmp
-> WHERE painting.price = tmp.max_price
-> AND painting.a_id = artist.a_id;
+----------+---------------+-------+
| name | title | price |
+----------+---------------+-------+
| Da Vinci | The Mona Lisa | 87 |
+----------+---------------+-------+


On the face of it, using a temporary table and a join is just a more
complicated way of answering the question. Does this technique have
any practical value? Yes, it does, because it leads to a more general
technique for answering more difficult questions. The previous
queries show information only for the most expensive single painting
in the entire painting table. What if your
question is, "What is the most expensive painting
per artist?" You can't use a SQL
variable to answer that question, because the answer requires finding
one price per artist, and a variable can hold only a single value at
a time. But the technique of using a temporary table works well,
because the table can hold multiple values and a join can find
matches for them all at once. To answer the question, select each
artist ID and the corresponding maximum painting price into a
temporary table. The table will contain not just the maximum painting
price, but the maximum within each group, where
"group" is defined as
"paintings by a given artist." Then
use the artist IDs and prices stored in the tmp
table to match records in the painting table, and
join the result with artist to get the artist
names:



mysql> CREATE TABLE tmp
-> SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id;
mysql> SELECT artist.name, painting.title, painting.price
-> FROM artist, painting, tmp
-> WHERE painting.a_id = tmp.a_id
-> AND painting.price = tmp.max_price
-> AND painting.a_id = artist.a_id;
+----------+-------------------+-------+
| name | title | price |
+----------+-------------------+-------+
| Da Vinci | The Mona Lisa | 87 |
| Van Gogh | The Potato Eaters | 67 |
| Renoir | Les Deux Soeurs | 64 |
+----------+-------------------+-------+


The same technique works for other kinds of values, such as temporal
values. Consider the driver_log table that lists
drivers and trips that they've taken:



mysql> SELECT name, trav_date, miles
-> FROM driver_log
-> ORDER BY name, trav_date;
+-------+------------+-------+
| name | trav_date | miles |
+-------+------------+-------+
| Ben | 2001-11-29 | 131 |
| Ben | 2001-11-30 | 152 |
| Ben | 2001-12-02 | 79 |
| Henry | 2001-11-26 | 115 |
| Henry | 2001-11-27 | 96 |
| Henry | 2001-11-29 | 300 |
| Henry | 2001-11-30 | 203 |
| Henry | 2001-12-01 | 197 |
| Suzi | 2001-11-29 | 391 |
| Suzi | 2001-12-02 | 502 |
+-------+------------+-------+


One type of maximum-per-group problem for this table is,
"show the most recent trip for each
driver." It can be solved like this:



mysql> CREATE TABLE tmp
-> SELECT name, MAX(trav_date) AS trav_date
-> FROM driver_log GROUP BY name;
mysql> SELECT driver_log.name, driver_log.trav_date, driver_log.miles
-> FROM driver_log, tmp
-> WHERE driver_log.name = tmp.name
-> AND driver_log.trav_date = tmp.trav_date
-> ORDER BY driver_log.name;
+-------+------------+-------+
| name | trav_date | miles |
+-------+------------+-------+
| Ben | 2001-12-02 | 79 |
| Henry | 2001-12-01 | 197 |
| Suzi | 2001-12-02 | 502 |
+-------+------------+-------+




12.7.4 See Also



The technique illustrated in this section shows how to answer
maximum-per-group questions by selecting summary information into a
temporary table and joining that table to the original one. This
technique has many applications. One such application is calculation
of team standings, where the standings for each group of teams are
determined by comparing each team in the group to the team with the
best record. Recipe 12.8 discusses how to do this.










    I l@ve RuBoard



    No comments: