Thursday, October 22, 2009

8.9 Eliminating Duplicates by Adding an Index




I l@ve RuBoard










8.9 Eliminating Duplicates by Adding an Index




8.9.1 Problem



A table has duplicates and you'd like to get rid of
them.





8.9.2 Solution



One way to do this is to create a unique index on the column or
columns containing duplicates.





8.9.3 Discussion



If MySQL discovers duplicate key values when you try to create a
PRIMARY KEY or a
UNIQUE index, it aborts the
ALTER TABLE operation. To
ignore the duplicates and proceed anyway, use
ALTER IGNORE
TABLE rather than ALTER
TABLE. The
IGNORE keyword tells MySQL to retain the first
row containing a duplicated key value and discard the others. This
is, in fact, a useful way to eliminate duplicates in a column or set
of columns: just create a unique-valued index and let MySQL throw
away the duplicates. (If you need to identify which key values are
duplicated, though, that's not a suitable technique.
See Recipe 14.4 for information on duplicate
identification.)



To see how IGNORE works to eliminate duplicates,
use mytbl, which now has no indexes if
you've issued the index-modification statements
shown earlier. First, insert some duplicate values into the table:



mysql> INSERT INTO mytbl (i,c) VALUES(1,'a'),(1,'a'),(1,NULL),(1,NULL),
-> (2,'a'),(2,'a'),(2,'b'),(2,'b');
mysql> SELECT * FROM mytbl;
+---+------+
| i | c |
+---+------+
| 1 | a |
| 1 | a |
| 1 | NULL |
| 1 | NULL |
| 2 | a |
| 2 | a |
| 2 | b |
| 2 | b |
+---+------+


Now suppose you want to create a unique-valued index comprising the
i and c columns. A
PRIMARY KEY cannot be used
here, because c contains NULL
values. You can create a UNIQUE index, but if you
try to do so without using IGNORE,
you'll get an error:



mysql> ALTER TABLE mytbl ADD UNIQUE (i,c);
ERROR 1062 at line 1: Duplicate entry '1-a' for key 1


Add IGNORE to the statement, then use
SELECT to have a look at the table contents to see
how the duplicates have been removed:



mysql> ALTER IGNORE TABLE mytbl ADD UNIQUE (i,c);
mysql> SELECT * FROM mytbl;
+---+------+
| i | c |
+---+------+
| 1 | NULL |
| 1 | NULL |
| 1 | a |
| 2 | a |
| 2 | b |
+---+------+


The output shows that duplicate records have been
eliminated�except those that contain NULL in
the key values. That's because
UNIQUE indexes allow multiple
NULL values. For another technique that removes
even duplicates that contain NULLs, see Recipe 14.7.










    I l@ve RuBoard



    No comments: