Thursday, November 12, 2009

Section B.5. Regular Expressions







B.5. Regular Expressions

When using the operators REGEXP,
RLIKE, and NOT REGEXP, you may need
special characters and parameters to be able to search for data based on
regular expressions. Table B-5 lists the
special characters, and Table B-6 shows
special constructs that may be used. In keeping with convention, patterns
to match are given within quotes. As an example of a regular expression
used with a SELECT statement, suppose that we want to
find the name of a particular student in a college's database, but we
can't quite remember his last name. All we remember is that it's something
like Smith, but it could be
Smithfield or maybe Smyth. We
could run an SQL statement like the following to get a list of
possibilities:

SELECT student_id, 
CONCAT(name_first, SPACE(1), name_last) AS Student
FROM students
WHERE name_last REGEXP 'Smith.*|Smyth';


As an example using a pattern-matching construct, suppose that we
suspect there are a few student records in which the name columns contain
numeric characters. Suppose also that there are some student records in
which the social_security column
contains characters other than numbers or dashes. We could search for them
by executing an SQL statement like the following:

SELECT student_id, soc_sec, 
CONCAT(name_first, SPACE(1), name_last) AS Student
FROM students
WHERE CONCAT(name_first, name_last) REGEXP '[[:digit:]]+'
OR soc_sec REGEXP '[[:alpha:]]+';


As an example of a construct using a character name, suppose that
the column containing Social Security tax identification numbers (i.e.,
soc_sec) shouldn't contain the usual hyphen separator
(i.e., 443-78-8391). We could enter an SQL statement
like the following to find records with hyphens in that column:

SELECT student_id, soc_sec, 
CONCAT(name_first, SPACE(1), name_last) AS Student
FROM students
WHERE soc_sec REGEXP '[[.hyphen.]]+';


To find any rows that do not specifically meet the format for the
Social Security number (i.e., nnn-nn-nnnn), we
could use this longer but more specific regular expression:

SELECT student_id, soc_sec, 
CONCAT(name_first, SPACE(1), name_last) AS Student
FROM students
WHERE soc_sec NOT REGEXP
'[[:digit:]]{3}[[.hyphen.]]{1}[[:digit:]]{2}[[.hyphen.]]{1}[[:digit:]]{4}';


Notice that this statement uses the curly braces after each
construct to specify the exact number of characters or digits
permitted.

Table B-5. Pattern-matching characters
CharacterUse
^Matches the beginning of the string.
$Matches the beginning of the string.
.Matches any character, space, or line
ending.
*Matches zero or more of the characters immediately
preceding.
+Matches one or more of the characters immediately
preceding.
?Matches zero or one of the characters immediately
preceding.
|An OR operator; matches the
characters before or after it (e.g.,
'Russell|Rusty').
(characters)*Matches zero or more occurrences of the sequence of
characters given in parentheses.
{number}Specifies the number of occurrences of the previous
pattern given.
{number,number}Specifies the minimum number of occurrences of the
previous pattern given, followed by the maximum number of
occurrences. If only the minimum number is omitted, 0 is assumed.
If just the maximum number is omitted, unlimited is
assumed.
[x-x]Specifies a range of characters in alphabetical order
(e.g., '[a-g]' for the first seven lowercase letters), or numbers
in numeric sequence (e.g., '[0-9]' for all
numbers).


Table B-6. Pattern-matching constructs
ConstructUse
[.character.]Matches the given character or character name (e.g.,
backslash, carriage return,
newline,
tab).
[=character=]Matches characters of the same class as the character
given.
[[:<:]]Matches the beginning of a word.
[[:>:]]Matches the end of a word.
[:alnum:]Matches alphanumeric characters.
[:alpha:]Matches alphabetical characters.
[:blank:]Matches a blank or whitespace
characters.
[:cntrl:]Matches control characters.
[:digit:]Matches digits.
[:lower:]Matches lowercase alphabetical
characters.
[:print:]Matches graphic and space characters.
[:punct:]Matches punctuation characters.
[:space:]Matches space, carriage return, newline, and tab
characters.
[:upper:]Matches uppercase alphabetical
characters.
[:xdigit:]Matches hexadecimal characters.









No comments: