17.2 Advanced Function Options
By now we've introduced you to
all
but one of Oracle's regular expression functions.
However, you've only seen them used in their
simplest form. Their full syntax is:
REGEXP_INSTR(source_string, expression
[, position [, occurrence
[, return_option
[, match_parameter]]]])
REGEXP_LIKE (source_string, expression
[, match_parameter])
REGEXP_REPLACE(source_string, expression
[, replace_string
[, position [, occurrence
[, match_parameter]]]])
REGEXP_SUBSTR(source_string, expression
[, position [, occurrence
[, match_parameter]]])
If you're familiar with Oracle's
string-manipulation functions, you'll have no
trouble discerning the purpose of the
REGEXP
functions:
REGEXP_INSTR
Returns the character position at which text can be found matching a
regular expression in a target string.
REGEXP_LIKE
Discerns whether a given string contains text matching an expression.
This is a Boolean function, returning TRUE or FALSE, or NULL.
REGEXP_REPLACE
Performs a regular expression search-and-replace operation, which
you'll learn about in Section 17.3.
REGEXP_SUBSTR
Extracts text matching a regular expression from a string.
The parameters to these functions are as follows:
- source_string
The string to be searched.
- expression
A regular expression describing the pattern of text that you seek.
- replace_string
A string generating the replacement text to be used in a
search-and-replace operation.
- position
The character position within
source_string at which to begin a search.
This defaults to 1.
- occurrence
The occurrence of the pattern you wish to locate. This defaults to 1,
giving you the first possible match.
- return_option
Valid only for REGEXP_INSTR, and determines whether the beginning or
ending character position is returned for text matching a pattern.
The default is 0, for the beginning. Use 1 to return the ending
position.
- match_parameter
A text string through which you may specify options to vary the
behavior of the regular expression matching engine.
The match_parameter deserves a bit of explanation.
It's a character string that you build using one or
more of the following letters: i,
c, n, m. One
use is to specify whether matching is case-sensitive. By default,
your NLS_SORT setting drives this aspect of regular expression
behavior. You can override that default on a call-by-call basis,
using i to get case-insensitive matching, and
c to get case-sensitive matching. Following is a
simple demonstration that works with our example data set. When the
i option is used, the match succeeds, even though
the case really does not match:
SELECT emp_id, text
FROM employee_comment
WHERE REGEXP_LIKE(text, 'CEDAR LAKE', 'c');
no rows selected
SELECT emp_id, text
FROM employee_comment
WHERE REGEXP_LIKE(text, 'CEDAR LAKE', 'i');
EMP_ID TEXT
---------- ---------------------------------------------
7499 1105 McConnell Court
Cedar Lake MI 48812
Home: (989) 387-4321
Cell: (237) 438-3333
7782 Academy Apartments, #138, Cedar Lake MI 48812
7788 #1 Water Tower Lane
Cedar Lake MI 48812
The n option is useful when working with data,
such as ours, that contains embedded newline (usually
chr(10)) characters. By default, the period
(.) does not match newlines, which is why the
following query brings back only the first line of comment text:
SELECT emp_id, REGEXP_SUBSTR(text, '.*') text
FROM employee_comment
WHERE emp_id = 7698;
EMP_ID TEXT
---------- ---------------
7698 Marion Blake
Using 'n' for the match parameter, you can have
the period match newlines, which in the case of this query, results
in the entire comment being returned:
SELECT emp_id, REGEXP_SUBSTR(text, '.*', 1, 1, 'n') text
FROM employee_comment
WHERE emp_id = 7698;
EMP_ID TEXT
---------- -----------------------
7698 Marion Blake
N1375 Lakeshore Drive
Vestaburg MI 48891
(989) 388-1234
This example also illustrates the important point that specifying a
match parameter forces you to also specify any preceding, optional
parameters that you would otherwise skip.
The final match option is m, which changes the
definition of line
with respect to the ^ and $
metacharacters. By default, line means the entire target
string, so an expression such as ^.*$,
together with the n option, will bring back all
characters in the target string:
SELECT emp_id, REGEXP_SUBSTR(text, '^.*$', 1, 1, 'n') text
FROM employee_comment
WHERE emp_id = 7788;
EMP_ID TEXT
---------- --------------------
7788 #1 Water Tower Lane
Cedar Lake MI 48812
The
n option is necessary here, because otherwise the
period would not match the newline. If the period did not match the
newline, the expression would fail to match at that point, and it
would be impossible to match all characters between the beginning and
end of the string.
However, using the m option causes the definition
of line to change from the entire target string,
to any line within that string, where lines are
delimited by newline characters. The following example removes
n, replacing it with m:
SELECT emp_id, REGEXP_SUBSTR(text, '^.*$', 1, 1, 'm') text
FROM employee_comment
WHERE emp_id = 7788;
EMP_ID TEXT
---------- --------------------
7788 #1 Water Tower Lane
You can specify multiple match parameters in any order. For example,
'in' means the same as 'ni'. If
you specify
conflicting options, such as
'ic', the last option, 'c' in
this case, is the one that takes precedence.
|
Try our last example in this section using 'mn' as
the match parameter string. You'll get back the
entire target string. Why? Because of the greediness rule. When
'mn' is used, the first line is a match, but the
entire string is a longer match.
|
|
|
No comments:
Post a Comment