Programmer's Life: Section B.5. Regular Expressions

B.5. Regular Expressions

When using the operators REGEXP,
RLIKE, and NOT REGEXP, you may need
special characters and parameters to be able to search for data based on
regular expressions. Table B-5 lists the
special characters, and Table B-6 shows
special constructs that may be used. In keeping with convention, patterns
to match are given within quotes. As an example of a regular expression
used with a SELECT statement, suppose that we want to
find the name of a particular student in a college's database, but we
can't quite remember his last name. All we remember is that it's something
like Smith, but it could be
Smithfield or maybe Smyth. We
could run an SQL statement like the following to get a list of
possibilities:

SELECT student_id, 
CONCAT(name_first, SPACE(1), name_last) AS Student
FROM students
WHERE name_last REGEXP 'Smith.*|Smyth';

As an example using a pattern-matching construct, suppose that we
suspect there are a few student records in which the name columns contain
numeric characters. Suppose also that there are some student records in
which the social_security column
contains characters other than numbers or dashes. We could search for them
by executing an SQL statement like the following:

SELECT student_id, soc_sec, 
CONCAT(name_first, SPACE(1), name_last) AS Student
FROM students
WHERE CONCAT(name_first, name_last) REGEXP '[[:digit:]]+'
OR soc_sec REGEXP '[[:alpha:]]+';

As an example of a construct using a character name, suppose that
the column containing Social Security tax identification numbers (i.e.,
soc_sec) shouldn't contain the usual hyphen separator
(i.e., 443-78-8391). We could enter an SQL statement
like the following to find records with hyphens in that column:

SELECT student_id, soc_sec, 
CONCAT(name_first, SPACE(1), name_last) AS Student
FROM students
WHERE soc_sec REGEXP '[[.hyphen.]]+';

To find any rows that do not specifically meet the format for the
Social Security number (i.e., nnn-nn-nnnn), we
could use this longer but more specific regular expression:

SELECT student_id, soc_sec, 
CONCAT(name_first, SPACE(1), name_last) AS Student 
FROM students 
WHERE soc_sec NOT REGEXP 
'[[:digit:]]{3}[[.hyphen.]]{1}[[:digit:]]{2}[[.hyphen.]]{1}[[:digit:]]{4}';

Notice that this statement uses the curly braces after each
construct to specify the exact number of characters or digits
permitted.

Table B-5. Pattern-matching characters
Character	Use
^	Matches the beginning of the string.
$	Matches the beginning of the string.
.	Matches any character, space, or line ending.
*	Matches zero or more of the characters immediately preceding.
+	Matches one or more of the characters immediately preceding.
?	Matches zero or one of the characters immediately preceding.
\|	An `OR` operator; matches the characters before or after it (e.g., `'Russell\|Rusty'`).
(`characters`)*	Matches zero or more occurrences of the sequence of characters given in parentheses.
{`number`}	Specifies the number of occurrences of the previous pattern given.
{`number`,`number`}	Specifies the minimum number of occurrences of the previous pattern given, followed by the maximum number of occurrences. If only the minimum number is omitted, 0 is assumed. If just the maximum number is omitted, unlimited is assumed.
[`x`-`x`]	Specifies a range of characters in alphabetical order (e.g., `'[a-g]'` for the first seven lowercase letters), or numbers in numeric sequence (e.g., `'[0-9]'` for all numbers).

Table B-6. Pattern-matching constructs
Construct	Use
[.`character`.]	Matches the given character or character name (e.g., `backslash`, `carriage return`, `newline`, `tab`).
[=`character`=]	Matches characters of the same class as the character given.
`[[:<:]]`	Matches the beginning of a word.
`[[:>:]]`	Matches the end of a word.
`[:alnum:]`	Matches alphanumeric characters.
`[:alpha:]`	Matches alphabetical characters.
`[:blank:]`	Matches a blank or whitespace characters.
`[:cntrl:]`	Matches control characters.
`[:digit:]`	Matches digits.
`[:lower:]`	Matches lowercase alphabetical characters.
`[:print:]`	Matches graphic and space characters.
`[:punct:]`	Matches punctuation characters.
`[:space:]`	Matches space, carriage return, newline, and tab characters.
`[:upper:]`	Matches uppercase alphabetical characters.
`[:xdigit:]`	Matches hexadecimal characters.

Programmer's Life

Thursday, November 12, 2009

Section B.5. Regular Expressions

B.5. Regular Expressions

Table B-5. Pattern-matching characters

Table B-6. Pattern-matching constructs

No comments:

Blog Archive

About Me

Link