Thursday, 9 November 2017

Regular expressions in Oracle SQL




Since Oracle10g new functions have been added to Oracle to allow the use of regular expressions in SQL and PL/SQL. These functions conform to the POSIX standard.
In Oracle10g the functions REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR and the condition REGEXP_LIKE have been added.
Then in Oracle11g the function REGEXP_COUNT has been added too. Let’s see how to use them.

REGEXP_INSTR works like the INSTR: it’s useful to find, into a given string, a sub-string that matches a given pattern. If the function doesn’t find any sub-string matching the pattern returns 0. 

The signature of the function is

1
REGEXP_INSTR (source_string, pattern, position, occurrence, return_option, match_parameter)

Where

  • source_string is the string in which you’re searching;
  • pattern is the regular expression;
  • position is the position starting from you want to search;
  • occurrence is the requested occurrence of the searching string;
  • return_option can evaluate to 0 if you want the position of the first character of the sub-string found, 1 if you want the position of the first character after it;
  • match_parameter is a flag that can evaluate to:

·  i (case-insensitive search),
·  c (case-sensitive search),
·  n (the wildcard ‘.’ finds the carriage return character too),
·  m (the string is multi-line) 

Only the first two are mandatory.

Let’s see an example: let’s find into the string ‘This is a test string to show how regexp work in Oracle db’ the second two-characters-long word:
1
2
3
4
5
6
7
8
WITH T AS (
SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
Select str, REGEXP_INSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )', 1, 2)+1 pos
FROM t;

STR                                                               POS
----------------------------------------------------------
This is a test string to show how regexp work in Oracle db         23

Let’s see the REGEXP_REPLACE function, using it you can replace a string (found using a regular expression) with another one:

1
REGEXP_REPLACE(source_string, pattern, replace_string, position, occurrence, match_parameter)


source_string is the string in which to find and replace;
pattern is the regular expression;
replace_string is the replacement string;
position is the position starting from you want to search;
occurrence is the requested occurrence of the searching string;
match_parameter works just like in the REGEXP_INSTR function.



So, if we want to replace, in the string used in the previous example, the second two-characters-long word with a X we can write:

1
2
3
4
5
6
7
8
WITH T AS (
SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
Select REGEXP_REPLACE(str,'(^|\ )[[:alpha:]]{2}($|\ )', ' X ',1, 2) newstr
FROM t;

NEWSTR
---------------------------------------------------------
This is a test string X show how regexp work in Oracle db

The third function to show is REGEXP_SUBSTR that allows us to extract a substring:

1
REGEXP_SUBSTR (source_string , pattern, position, occurrence, match_parameter)

source_string is the string in which to find;
pattern is the regular expression;
position is the position starting from you want to search;
occurrence is the requested occurrence of the searching string;
match_parameter works just like in the REGEXP_INSTR function.

So, if we want to extract the second two-characters-long word we do:

1
2
3
4
5
6
7
8
WITH T AS (
SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
Select str, trim(REGEXP_SUBSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )',1, 2)) substr
FROM t;

STR                                                        SUBSTR
---------------------------------------------------------- ------
This is a test string to show how regexp work in Oracle db to

REGEXP_COUNT allows us to count the occurrences in the source string that match the pattern:

1
REGEXP_COUNT (source_char , pattern, position, match_param) 

source_char is the string in which to find;
pattern is the regular expression;
position is the position starting from you want to search;
return_option works just like in the REGEXP_INSTR function.

So, if we want to count the two-characters-long words, we can do:

1
2
3
4
5
6
7
8
WITH T AS (
SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
Select str, REGEXP_COUNT(str,'(^|\ )[[:alpha:]]{2}($|\ )',1) count
FROM t;

STR                                                             COUNT
---------------------------------------------------------- ----------
This is a test string to show how regexp work in Oracle db          4

We can combine REGEXP_COUNT and REGEXP_SUBSTR (and a trick of the connect by clause) to extract in a single statement all the two-characters-long words:

1
2
3
4
5
6
7
8
9
10
11
WITH T AS (
SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
Select trim(REGEXP_SUBSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )',1, level)) substr
FROM t connect by level<=REGEXP_COUNT(str,'(^|\ )[[:alpha:]]{2}($|\ )',1);

SUBSTR
------
is
to
in
db

Once seen all the functions let’s see REGEXP_LIKE. It’s a condition, that is to say it returns a Boolean value and can be used in the WHERE or in the HAVING clause of a query statement:

1
REGEXP_LIKE (source_string, pattern, match_parameter)

source_string is the string in which to find;
pattern is the regular expression;
match_parameter works just like in the REGEXP_INSTR function.

REGEXP_LIKE(str,ptrn,mp) is equivalent to REGEXP_INSTR(str,ptrn,1,1,mp)>0

Let’s extract all the string that include at least one two-characters-long word:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
WITH T AS (
SELECT 'String without 2chars words' str from dual union
SELECT 'first string with a word made of two chars' from dual union
SELECT 'second string with a word made of two chars' from dual union
SELECT 'Another string without 2chars words' from dual
)
Select str
from t
where REGEXP_LIKE(str,'(^|\ )[[:alpha:]]{2}($|\ )');

STR
-------------------------------------------
first string with a word made of two chars
second string with a word made of two chars

For the equivalence stated above, we could write:

1
2
3
4
5
6
7
8
9
WITH T AS (
SELECT 'String without 2chars words' str from dual union
SELECT 'first string with a word made of two chars' from dual union
SELECT 'second string with a word made of two chars' from dual union
SELECT 'Another string without 2chars words' from dual
)
Select str
from t
where REGEXP_INSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )')>0;

That’s all about regular expressions in Oracle SQL & PL/SQL!

No comments: