Tuesday, October 27, 2009

12.19 Updating One Table Based on Values in Another




I l@ve RuBoard










12.19 Updating One Table Based on Values in Another




12.19.1 Problem



You need to update existing records in
one table based on the contents of records in another table, but
MySQL doesn't yet allow join syntax in the
WHERE clause of UPDATE
statements. So you have no way to associate the two tables.





12.19.2 Solution



Create a new table that is populated from the result of a join
between the original table and the table containing the new
information. Then replace the original table with the new one. Or
write a program that selects information from the related table and
issues the queries necessary to update the original table. Or use
mysql to generate and execute the queries.





12.19.3 Discussion



Sometimes when updating records in one table, it's
necessary to refer to records in another table. Recall that the
states table used in several earlier recipes
contains rows that look like this:



mysql> SELECT * FROM states;
+----------------+--------+------------+----------+
| name | abbrev | statehood | pop |
+----------------+--------+------------+----------+
| Alaska | AK | 1959-01-03 | 550043 |
| Alabama | AL | 1819-12-14 | 4040587 |
| Arkansas | AR | 1836-06-15 | 2350725 |
| Arizona | AZ | 1912-02-14 | 3665228 |
...


Now suppose that you want to add some new columns to this table,
using information from another table, city, that
contains information about each state's capital city
and largest (most populous) city:



mysql> SELECT * FROM city;
+----------------+----------------+----------------+
| state | capital | largest |
+----------------+----------------+----------------+
| Alabama | Montgomery | Birmingham |
| Alaska | Juneau | Anchorage |
| Arizona | Phoenix | Phoenix |
| Arkansas | Little Rock | Little Rock |
...


It would be easy enough to add new columns named
capital and largest to the
states table structure using an
ALTER TABLE statement. But then
how would you modify the rows to fill in the new columns with the
appropriate values? The most convenient way to do this would be to
run an UPDATE query that uses join syntax in the
WHERE clause:



UPDATE states,city
SET states.capital = city.capital, states.largest = city.largest
WHERE states.name = city.state;


That doesn't work, because MySQL does not yet
support this syntax. Another solution would be to use a subselect in
the WHERE clause, but subselects are not scheduled
for inclusion until MySQL 4.1. What are the alternatives? Clearly,
you don't want to update each row by hand.
That's unacceptably tedious�and silly, too,
given that the new information is already stored in the
city table. The states and
city tables contain a common key (state names), so
let's use that information to relate the two tables
and perform the update. There are a few techniques you can use to
achieve the same result as a multiple-table update:




  • Create a new table that is like the original
    states table, but includes the additional columns
    to be added from the related table, city. Populate
    the new table using the result of a join between the
    states and city tables, then
    replace the original states table with the new
    one.


  • Write a program that uses the information from the
    city table to generate and execute
    UPDATE statements that update the
    states table one state at a time.


  • Use mysql to generate the
    UPDATE statements.





12.19.4 Performing a Related-Table Update Using Table Replacement



The table-replacement
approach works as follows. To extend the states
table with the capital and
largest columns from the city
table, create a tmp table that is like the
states table but adds capital
and largest columns:



CREATE TABLE tmp
(
name VARCHAR(30) NOT NULL, # state name
abbrev CHAR(2) NOT NULL, # 2-char abbreviation
statehood DATE, # date of entry into the Union
pop BIGINT, # population as of 4/1990
capital VARCHAR(30), # capital city
largest VARCHAR(30), # most populous city
PRIMARY KEY (abbrev)
);


Then populate tmp using the result of a
join between
states and city that matches up
rows in the two tables using state names:



INSERT INTO tmp (name, abbrev, statehood, pop, capital, largest)
SELECT
states.name, states.abbrev, states.statehood, states.pop,
city.capital, city.largest
FROM
states LEFT JOIN city ON states.name = city.state;


The query uses a LEFT JOIN for a reason.
Suppose the city table is incomplete and
doesn't contain a row for every state. In that case,
a regular join will fail to produce an output row for any states that
are missing from the city table, and the resulting
tmp table will be missing records for those
states, even though they are present in the states
table. Not good! The LEFT JOIN
ensures that the SELECT produces output for every
row in the states table, whether or not
it's matched by a city table row.
Any state that is missing in the city table would
end up with NULL values in the
tmp table for the capital and
largest columns, but that's
appropriate when you don't know the city
names�and generating an incomplete row certainly is preferable
to losing the row entirely.



The resulting tmp table is like the original one,
but has two new columns, capital and
largest. You can examine it to see this. After
verifying that you're satisfied with the
tmp table, use it to replace the original
states table:



DROP TABLE states;
ALTER TABLE tmp RENAME TO states;


If you want to make sure there is no time, however brief, during
which the states table is unavailable, perform the
replacement like this instead:



RENAME TABLE states TO states_old, tmp TO states;
DROP TABLE states_old;




12.19.5 Performing a Related-Table Update by Writing a Program



The
table-replacement technique is efficient because it lets the server
do all the work. On the other hand, it is most
appropriate when you're updating all or most of the
rows in the table. If you're updating just a few
rows, it may be less work to update the table "in
place" for just those rows that need it. Also, table
replacement requires more than twice the space of the original
states table while you're
carrying out the update procedure. If you have a huge table to
update, you may not want to use all that space.



A second technique for updating a table based on a related table is
to read the information from the related table and use it to generate
UPDATE statements. For example, to update
states with the information stored in the
city table, read the city names and use them to
create and issue a series of queries like this:



UPDATE states SET capital = 'Montgomery', largest = 'Birmingham'
WHERE name = 'Alabama';
UPDATE states SET capital = 'Juneau', largest = 'Anchorage'
WHERE name = 'Alaska';
UPDATE states SET capital = 'Phoenix', largest = 'Phoenix'
WHERE name = 'Arizona';
UPDATE states SET capital = 'Little Rock', largest = 'Little Rock'
WHERE name = 'Arkansas';
...


To carry out this procedure, first alter the
states table so that it includes the new
columns:[3]


[3] If you've already modified
states using the table-replacement procedure,
first restore the table to its original structure by dropping the
capital and largest
columns:



ALTER TABLE states ADD capital VARCHAR(30), ADD largest VARCHAR(30);


Next, write a program that reads the city table
and uses its contents to produce UPDATE statements
that modify the states table. Here is an example
script, update_cities.pl, that does so:



#! /usr/bin/perl -w
# update_cities.pl - update states table capital and largest city columns,
# using contents of city table. This assumes that the states table has
# been modified to include columns named capital and largest.

use strict;
use lib qw(/usr/local/apache/lib/perl);
use Cookbook;

my $dbh = Cookbook::connect ( );

my $sth = $dbh->prepare ("SELECT state, capital, largest FROM city");
$sth->execute ( );
while (my ($state, $capital, $largest) = $sth->fetchrow_array ( ))
{
$dbh->do ("UPDATE states SET capital = ?, largest = ? WHERE name = ?",
undef, $capital, $largest, $state);
}

$dbh->disconnect ( );

exit (0);


The script has all the table and column names built in to it, which
makes it very special purpose. You could generalize this procedure by
writing a function that accepts parameters indicating the table
names, the columns to use for matching records in the two tables, and
the columns to use for updating the rows. The
update_related.pl
script in the joins directory of the
recipes distribution shows one way to do this.





12.19.6 Performing a Related-Table Update Using mysql



If
your data values don't require any special handling
for internal quotes or other special characters, you can use
mysql to generate and process the
UPDATE statements. This is similar to the
technique shown in Recipe 12.18 for using
mysql to simulate a subselect.



Put the following statement in a file,
update_cities.sql:



SELECT CONCAT('UPDATE states SET capital = \'',capital,
'\', largest = \'',largest,'\' WHERE name = \'',state,'\';')
FROM city;


The query reads the rows of the city table and
uses them to generate statements that update
states. Execute the query and save the result in
tmp:



% mysql -N cookbook < update_cities.sql > tmp


tmp will contain statements that look like the
queries generated by the update_cities.pl
script. Assuming that you're added the
capital and largest columns to
the states table, you can execute these statements
as follows to update the table:



% mysql cookbook < tmp









    I l@ve RuBoard



    No comments: