René Nyffenegger's collection of things on the web | |
René Nyffenegger on Oracle - Most wanted - Feedback
- Follow @renenyffenegger
|
December 22, 2005: On making Oracle's regular expressions a bit easier to use | ||
I like programming Perl a lot. When I first used Perl some ten years ago, I immediatly fell in love with this language because it was
so easy to use regular expressions (I don't know much about today's trendy languages like Ruby or Python, but I guess, these
languages make it equally simple to use regular expressions).
Now, Oracle 10g comes with four
regular expressions supporting sql functions:
regexp_instr,
regexp_substr, regexp_replace
and regexp_like. Yet, using them is not nearly as simple as using Perl's regular expressions.
Specifically, I am missing an easy way to get the values matched by paranthesis. So, I had to write a package to do that for me.
Here's the package's specification:
create package re as type matched is table of varchar2(100); function match ( str in varchar2, pat in varchar2, match# in number, flags in varchar2 := null) return matched; end re; /
In order to use the package, the function
match must be called. str is the string on which the
regular expression pat is applied. match# specifies how many parantheses pairs are in pat .
Finally, flags corresponds to the parameters i, c, n, m and x.
When the function is called, it returns a
table of varchar2 containing match# varchar2s. Each varchar2 found
in the table corresponds to a paranthesis pair given in pat . If it couldn't match the string, it returns
null.
Here's the package's body:
create package body re as function match ( str in varchar2, pat in varchar2, match# in number, flags in varchar2 := null) return matched is ret matched; pos_match_begin number; pos_match_end number; str_used varchar2(4000); begin if match# > 8 then raise_application_error(-20000, 'at most 8 matches!'); end if; pos_match_begin := regexp_instr ( str, -- pat, 1, -- start position 1, -- occurance 0, -- return option flags ); if pos_match_begin > 0 then ret := matched(); ret.extend(match#); else return null; end if; pos_match_end := regexp_instr ( str, -- pat, 1, -- start position 1, -- occurance 1, -- return option flags ); str_used := substr(str, pos_match_begin, pos_match_end - pos_match_begin); for i in 1 .. match# loop ret(i) := regexp_replace(str_used, '(' || pat || ')', '\' || (i+1) , 1/*pos*/, 1, flags); end loop; return ret; end match; end re; /
I want everyone on my database to profit from this package, so I grant execute on it to
public and give a public synonym
for the package:
grant execute on re to public; create public synonym re for re; First example
The first example tries to find three numbers. The regular expression contains three paranthesis (one for each number I want to find).
Each paranthesis contains a
\d+ which is a short hand form meaning one or more digits. \D+ is a short hand
form meaning any letter except digits. I don't want to find non-digits, so these are not found within paranthesis.
declare matched re.matched; begin matched := re.match('foo 123 bar 456 baz 789 qax', '(\d+)' || '\D+' || '(\d+)' || '\D+' || '(\d+)' , 3); for i in 1 .. matched.count loop dbms_output.put_line(matched(i)); end loop; end; / 123 456 789 Second example
The second example tries to match dog or cat in a case insensitive manner. Therefore, I pass an
i (meaning case insensitive)
as forth parameter to match. The pipe symbol in the regular expression must be read as or. Hence, the function returns dog or cat, but
preserves its original case:
declare matched re.matched; regexp varchar2(100); begin regexp := '(dog|cat)'; matched := re.match('FroG DoG mOusE', regexp, 1, 'i'); dbms_output.put_line(matched(1)); matched := re.match('A cAtAstrophal caT', regexp, 1, 'i'); dbms_output.put_line(matched(1)); end; / DoG cAt Third example
The third example tries to determine if there is at least an upper case letter in
str . The [[:upper:]]+
matches one or more consecutive upper case letters. If there is none, match will return null.
declare procedure find_upperace(str in varchar2) is matched re.matched; begin matched := re.match(str, '([[:upper:]]+)', 1); if matched is null then dbms_output.put_line('no upper case letters in: ' || str); else dbms_output.put_line('Found upper case letters: ' || matched(1)); end if; end find_upperace; begin find_upperace('one two three'); find_upperace('fiveSIXseven' ); end; / no upper case letters in: one two three Found upper case letters: SIX Limitations
Unfortunately, at most eight paranthesis pairs can be passed to
match . This is because Oracle's regular expressions
can have at most 9 so called back references, and the package needs one pair for internal use.
More on OracleThis is an on Oracle article. The most current articles of this series can be found here.
|