Thursday, February 4, 2010

17.2 Advanced Function Options











 < Day Day Up > 







17.2 Advanced Function Options





By now we've introduced you to



all

but one of Oracle's regular expression functions.

However, you've only seen them used in their

simplest form. Their full syntax is:





REGEXP_INSTR(source_string, expression 

[, position [, occurrence

[, return_option

[, match_parameter]]]])



REGEXP_LIKE (source_string, expression

[, match_parameter])



REGEXP_REPLACE(source_string, expression

[, replace_string

[, position [, occurrence

[, match_parameter]]]])



REGEXP_SUBSTR(source_string, expression

[, position [, occurrence

[, match_parameter]]])







If you're familiar with Oracle's

string-manipulation functions, you'll have no

trouble discerning the purpose of the

REGEXP

functions:








REGEXP_INSTR





Returns the character position at which text can be found matching a

regular expression in a target string.








REGEXP_LIKE





Discerns whether a given string contains text matching an expression.

This is a Boolean function, returning TRUE or FALSE, or NULL.








REGEXP_REPLACE





Performs a regular expression search-and-replace operation, which

you'll learn about in Section 17.3.








REGEXP_SUBSTR





Extracts text matching a regular expression from a string.









The parameters to these functions are as follows:






source_string





The string to be searched.






expression





A regular expression describing the pattern of text that you seek.






replace_string





A string generating the replacement text to be used in a

search-and-replace operation.






position





The character position within

source_string at which to begin a search.

This defaults to 1.






occurrence





The occurrence of the pattern you wish to locate. This defaults to 1,

giving you the first possible match.






return_option





Valid only for REGEXP_INSTR, and determines whether the beginning or

ending character position is returned for text matching a pattern.

The default is 0, for the beginning. Use 1 to return the ending

position.






match_parameter





A text string through which you may specify options to vary the

behavior of the regular expression matching engine.









The match_parameter deserves a bit of explanation.

It's a character string that you build using one or

more of the following letters: i,

c, n, m. One

use is to specify whether matching is case-sensitive. By default,

your NLS_SORT setting drives this aspect of regular expression

behavior. You can override that default on a call-by-call basis,

using i to get case-insensitive matching, and

c to get case-sensitive matching. Following is a

simple demonstration that works with our example data set. When the

i option is used, the match succeeds, even though

the case really does not match:





SELECT emp_id, text

FROM employee_comment

WHERE REGEXP_LIKE(text, 'CEDAR LAKE', 'c');



no rows selected



SELECT emp_id, text

FROM employee_comment

WHERE REGEXP_LIKE(text, 'CEDAR LAKE', 'i');



EMP_ID TEXT

---------- ---------------------------------------------

7499 1105 McConnell Court

Cedar Lake MI 48812

Home: (989) 387-4321

Cell: (237) 438-3333

7782 Academy Apartments, #138, Cedar Lake MI 48812

7788 #1 Water Tower Lane

Cedar Lake MI 48812







The n option is useful when working with data,

such as ours, that contains embedded newline (usually

chr(10)) characters. By default, the period

(.) does not match newlines, which is why the

following query brings back only the first line of comment text:





SELECT emp_id, REGEXP_SUBSTR(text, '.*') text

FROM employee_comment

WHERE emp_id = 7698;



EMP_ID TEXT

---------- ---------------

7698 Marion Blake







Using 'n' for the match parameter, you can have

the period match newlines, which in the case of this query, results

in the entire comment being returned:





SELECT emp_id, REGEXP_SUBSTR(text, '.*', 1, 1, 'n') text

FROM employee_comment

WHERE emp_id = 7698;



EMP_ID TEXT

---------- -----------------------

7698 Marion Blake

N1375 Lakeshore Drive

Vestaburg MI 48891

(989) 388-1234







This example also illustrates the important point that specifying a

match parameter forces you to also specify any preceding, optional

parameters that you would otherwise skip.





The final match option is m, which changes the

definition of line



with respect to the ^ and $

metacharacters. By default, line means the entire target

string
, so an expression such as ^.*$,

together with the n option, will bring back all

characters in the target string:





SELECT emp_id, REGEXP_SUBSTR(text, '^.*$', 1, 1, 'n') text

FROM employee_comment

WHERE emp_id = 7788;



EMP_ID TEXT

---------- --------------------

7788 #1 Water Tower Lane

Cedar Lake MI 48812







The



n option is necessary here, because otherwise the

period would not match the newline. If the period did not match the

newline, the expression would fail to match at that point, and it

would be impossible to match all characters between the beginning and

end of the string.





However, using the m option causes the definition

of line to change from the entire target string,

to any line within that string, where lines are

delimited by newline characters. The following example removes

n, replacing it with m:





SELECT emp_id, REGEXP_SUBSTR(text, '^.*$', 1, 1, 'm') text

FROM employee_comment

WHERE emp_id = 7788;



EMP_ID TEXT

---------- --------------------

7788 #1 Water Tower Lane







You can specify multiple match parameters in any order. For example,

'in' means the same as 'ni'. If

you specify





conflicting options, such as

'ic', the last option, 'c' in

this case, is the one that takes precedence.







Try our last example in this section using 'mn' as

the match parameter string. You'll get back the

entire target string. Why? Because of the greediness rule. When

'mn' is used, the first line is a match, but the

entire string is a longer match.




















     < Day Day Up > 



    No comments: