FAQ
First of all I would like to say that finding detailed information about
Oracle's implementation of regular expressions is next to impossible. Either
that, or I am looking in all the wrong places. So, links to any complete
online documentation would be appreciated. I found plenty of 'basic
explanation' but there has got to be more.

Here is the specific problem I am trying to solve using REGEXP_SUBSTR. I
have a string that might look something like this:

'2B4a12345Z2lr7' . I want to extract the 12345. In words, find the first
letter 'a' and return all the digits up to the letter Z.

There can be any number of random non-'a' characters prior to the initial
'a' and likewise after the letter Z. In other words, in my example, '2B4'
just represents some random sequence of characters, as does '2lr7'.

I could easily use INSTR and SUBSTR to do this, but my goal is to solve the
problem using a single REGEXP_SUBSTR statement. Also, I have used the
pattern 'a[[:digit:]]+' which results in 'a12345'. This is almost right, but
I don't want the 'a' to be returned.

Thanks,
Mike

Search Discussions

  • Rich Jesse at Oct 23, 2009 at 6:53 pm
    Hey Mike,
    First of all I would like to say that finding detailed information about
    Oracle's implementation of regular expressions is next to impossible. Either
    that, or I am looking in all the wrong places. So, links to any complete
    online documentation would be appreciated. I found plenty of 'basic
    explanation' but there has got to be more.
    This site might help:

    http://www.regular-expressions.info/oracle.html

    ...although it's lacking in version-specific information about what Oracle's
    implemented. There's also a link at the bottom of the page for a book from
    Amazon.
    I could easily use INSTR and SUBSTR to do this, but my goal is to solve the
    problem using a single REGEXP_SUBSTR statement. Also, I have used the
    pattern 'a[[:digit:]]+' which results in 'a12345'. This is almost right, but
    I don't want the 'a' to be returned.
    I don't think that's possible with Oracle's regex. It's possible in Perl,
    but only in 5.10 and up from what I've found. This is as close as I could
    get:

    select regexp_substr('2B4a12345Z2lr7','[[:digit:]]+',

    regexp_instr('2B4a12345Z2lr7','a'))
    from dual;

    Yes, the REGEXP_INSTR is overkill as INSTR would work just as well and is
    most likely faster for this particular use.

    Here's the doc that describes a possible "true" answer in regex, although it
    does not appear to be possible in Oracle's regex.

    HTH!

    Rich
  • Michael Moore at Oct 23, 2009 at 7:12 pm
    Thanks Rich,
    Yes, I was looking for the equiv of Perl look-around functionality.Sorry to
    hear it's not there. Thanks for the reference.
    Mike

    On Fri, Oct 23, 2009 at 11:53 AM, Rich Jesse <
    rjoralist_at_society.servebeer.com> wrote:
    Hey Mike,
    First of all I would like to say that finding detailed information about
    Oracle's implementation of regular expressions is next to impossible. Either
    that, or I am looking in all the wrong places. So, links to any complete
    online documentation would be appreciated. I found plenty of 'basic
    explanation' but there has got to be more.
    This site might help:

    http://www.regular-expressions.info/oracle.html

    ...although it's lacking in version-specific information about what
    Oracle's
    implemented. There's also a link at the bottom of the page for a book from
    Amazon.
    I could easily use INSTR and SUBSTR to do this, but my goal is to solve the
    problem using a single REGEXP_SUBSTR statement. Also, I have used the
    pattern 'a[[:digit:]]+' which results in 'a12345'. This is almost right, but
    I don't want the 'a' to be returned.
    I don't think that's possible with Oracle's regex. It's possible in Perl,
    but only in 5.10 and up from what I've found. This is as close as I could
    get:

    select regexp_substr('2B4a12345Z2lr7','[[:digit:]]+',
    regexp_instr('2B4a12345Z2lr7','a'))
    from dual;

    Yes, the REGEXP_INSTR is overkill as INSTR would work just as well and is
    most likely faster for this particular use.

    Here's the doc that describes a possible "true" answer in regex, although
    it
    does not appear to be possible in Oracle's regex.

    HTH!
    Rich


    --
    http://www.freelists.org/webpage/oracle-l

    --
    http://www.freelists.org/webpage/oracle-l
  • Jared Still at Oct 23, 2009 at 7:10 pm
    Mike,

    The [:digit:] class actually need to be written as [[:digit:]].

    Here's an example using regexp_replace

    select '2B4a12345Z2lr7' source

    '2B4a12345Z2lr7'
    want from 'a' to 'Z', regexp_replace('2B4a12345Z2lr7','^.*a([[:digit:]]+)Z.*$','\1',1,1)
    target, regexp_replace('2B4a12345Z2lr7','^.*a([0123456789]+)Z.*$','\1',1,1)
    target
    from dual
    /

    12:09:39 ordevdb01.radisys.com - js001292_at_dv11 SQL> /

    SOURCE TARGE TARGE

    -------------- ----- -----
    2B4a12345Z2lr7 12345 12345

    1 row selected.

    Jared Still
    Certifiable Oracle DBA and Part Time Perl Evangelist
    Oracle Blog: http://jkstill.blogspot.com
    Home Page: http://jaredstill.com
    On Fri, Oct 23, 2009 at 11:01 AM, Michael Moore wrote:

    First of all I would like to say that finding detailed information about
    Oracle's implementation of regular expressions is next to impossible. Either
    that, or I am looking in all the wrong places. So, links to any complete
    online documentation would be appreciated. I found plenty of 'basic
    explanation' but there has got to be more.

    Here is the specific problem I am trying to solve using REGEXP_SUBSTR. I
    have a string that might look something like this:

    '2B4a12345Z2lr7' . I want to extract the 12345. In words, find the first
    letter 'a' and return all the digits up to the letter Z.

    There can be any number of random non-'a' characters prior to the initial
    'a' and likewise after the letter Z. In other words, in my example, '2B4'
    just represents some random sequence of characters, as does '2lr7'.

    I could easily use INSTR and SUBSTR to do this, but my goal is to solve the
    problem using a single REGEXP_SUBSTR statement. Also, I have used the
    pattern 'a[[:digit:]]+' which results in 'a12345'. This is almost right, but
    I don't want the 'a' to be returned.

    Thanks,
    Mike
    --
    http://www.freelists.org/webpage/oracle-l
  • Michael Moore at Oct 23, 2009 at 7:19 pm
    Hi Jared,
    Yeah, I figured that out about the double [[. Seems contrary to what the
    manual says, but there you have it. I didn't consider using regexp_replace.
    Cool. I'll play with it a bit.
    Thanks!
    Mike
    On Fri, Oct 23, 2009 at 12:10 PM, Jared Still wrote:

    Mike,

    The [:digit:] class actually need to be written as [[:digit:]].

    Here's an example using regexp_replace

    select '2B4a12345Z2lr7' source
    -- '2B4a12345Z2lr7'
    -- want from 'a' to 'Z'
    , regexp_replace('2B4a12345Z2lr7','^.*a([[:digit:]]+)Z.*$','\1',1,1)
    target
    , regexp_replace('2B4a12345Z2lr7','^.*a([0123456789]+)Z.*$','\1',1,1)
    target
    from dual
    /

    12:09:39 ordevdb01.radisys.com - js001292_at_dv11 SQL> /

    SOURCE TARGE TARGE
    -------------- ----- -----
    2B4a12345Z2lr7 12345 12345

    1 row selected.


    Jared Still
    Certifiable Oracle DBA and Part Time Perl Evangelist
    Oracle Blog: http://jkstill.blogspot.com
    Home Page: http://jaredstill.com



    On Fri, Oct 23, 2009 at 11:01 AM, Michael Moore wrote:

    First of all I would like to say that finding detailed information about
    Oracle's implementation of regular expressions is next to impossible. Either
    that, or I am looking in all the wrong places. So, links to any complete
    online documentation would be appreciated. I found plenty of 'basic
    explanation' but there has got to be more.

    Here is the specific problem I am trying to solve using REGEXP_SUBSTR. I
    have a string that might look something like this:

    '2B4a12345Z2lr7' . I want to extract the 12345. In words, find the first
    letter 'a' and return all the digits up to the letter Z.

    There can be any number of random non-'a' characters prior to the initial
    'a' and likewise after the letter Z. In other words, in my example, '2B4'
    just represents some random sequence of characters, as does '2lr7'.

    I could easily use INSTR and SUBSTR to do this, but my goal is to solve
    the problem using a single REGEXP_SUBSTR statement. Also, I have used the
    pattern 'a[[:digit:]]+' which results in 'a12345'. This is almost right, but
    I don't want the 'a' to be returned.

    Thanks,
    Mike
    --
    http://www.freelists.org/webpage/oracle-l
  • Jaromir D.B. Nemec at Oct 25, 2009 at 7:31 pm
    Hi Michael,

    small addendum to the Jared's solution - in case there can be more such digit sequences in the string

    select '2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7' source

    want from 'a' to 'Z', regexp_replace('2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7','^.*a([[:digit:]]+)Z.*$','\1',1,1) last_occurence, regexp_replace('2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7','^.*?a([[:digit:]]+)Z.*$','\1',1,1) first_occurence, regexp_replace('2B4a12345Z2la1234567ZraxxxZyaZZaa45aZZZ7','a([[:digit:]]+)Z|a|[^a]*','\1 ',1,0) all_occurences
    from dual;
    SOURCE LAST_OCCURENCE FIRST_OCCURENCE ALL_OCCURENCES

    ------------------------------------- -------------- --------------- ---------------------------
    2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7 6789 12345 12345 1234567

    1 rows selected

    See http://forums.oracle.com/forums/thread.jspa?threadID=668970 for further details of the all_occurences solution.

    Regards,

    Jaromir

    Original Message -----
    From: Jared Still
    To: michaeljmoore_at_gmail.com
    Cc: oracle-l@freelists.org
    Sent: Friday, October 23, 2009 8:10 PM
    Subject: Re: Need regexp help (seemingly simple problem)

    Mike,

    The [:digit:] class actually need to be written as [[:digit:]].

    Here's an example using regexp_replace

    select '2B4a12345Z2lr7' source

    '2B4a12345Z2lr7'
    want from 'a' to 'Z', regexp_replace('2B4a12345Z2lr7','^.*a([[:digit:]]+)Z.*$','\1',1,1) target, regexp_replace('2B4a12345Z2lr7','^.*a([0123456789]+)Z.*$','\1',1,1) target
    from dual
    /

    12:09:39 ordevdb01.radisys.com - js001292_at_dv11 SQL> /

    SOURCE TARGE TARGE

    ----- -----
    2B4a12345Z2lr7 12345 12345

    1 row selected.

    Jared Still
    Certifiable Oracle DBA and Part Time Perl Evangelist
    Oracle Blog: http://jkstill.blogspot.com
    Home Page: http://jaredstill.com

    On Fri, Oct 23, 2009 at 11:01 AM, Michael Moore wrote:

    First of all I would like to say that finding detailed information about Oracle's implementation of regular expressions is next to impossible. Either that, or I am looking in all the wrong places. So, links to any complete online documentation would be appreciated. I found plenty of 'basic explanation' but there has got to be more.

    Here is the specific problem I am trying to solve using REGEXP_SUBSTR. I have a string that might look something like this:

    '2B4a12345Z2lr7' . I want to extract the 12345. In words, find the first letter 'a' and return all the digits up to the letter Z.

    There can be any number of random non-'a' characters prior to the initial 'a' and likewise after the letter Z. In other words, in my example, '2B4' just represents some random sequence of characters, as does '2lr7'.

    I could easily use INSTR and SUBSTR to do this, but my goal is to solve the problem using a single REGEXP_SUBSTR statement. Also, I have used the pattern 'a[[:digit:]]+' which results in 'a12345'. This is almost right, but I don't want the 'a' to be returned.

    Thanks,
    Mike
  • Michael Moore at Oct 25, 2009 at 7:44 pm
    Very nice, Thanks Jaromir. The all_occurrence is nice to know.
    Mike

    On Sun, Oct 25, 2009 at 12:31 PM, Jaromir D.B. Nemec
    wrote:
    Hi Michael,

    small addendum to the Jared's solution - in case there can be more such
    digit sequences in the string

    select '2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7' source

    -- want from 'a' to 'Z'
    ,
    regexp_replace('2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7','^.*a([[:digit:]]+)Z.*$','\1',1,1)
    last_occurence
    ,
    regexp_replace('2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7','^.*?a([[:digit:]]+)Z.*$','\1',1,1)
    first_occurence
    ,
    regexp_replace('2B4a12345Z2la1234567ZraxxxZyaZZaa45aZZZ7','a([[:digit:]]+)Z|a|[^a]*','\1
    ',1,0) all_occurences
    from dual
    ;
    SOURCE LAST_OCCURENCE FIRST_OCCURENCE
    ALL_OCCURENCES
    ------------------------------------- -------------- ---------------
    ---------------------------
    2B4a12345Z2la6789ZraxxxZyaZZa45aaZZZ7 6789 12345
    12345 1234567

    1 rows selected

    See http://forums.oracle.com/forums/thread.jspa?threadID=668970 for
    further details of the all_occurences solution.

    Regards,

    Jaromir


    ----- Original Message -----
    *From:* Jared Still
    *To:* michaeljmoore_at_gmail.com
    *Cc:* oracle-l@freelists.org
    *Sent:* Friday, October 23, 2009 8:10 PM
    *Subject:* Re: Need regexp help (seemingly simple problem)

    Mike,

    The [:digit:] class actually need to be written as [[:digit:]].

    Here's an example using regexp_replace

    select '2B4a12345Z2lr7' source
    -- '2B4a12345Z2lr7'
    -- want from 'a' to 'Z'
    , regexp_replace('2B4a12345Z2lr7','^.*a([[:digit:]]+)Z.*$','\1',1,1)
    target
    , regexp_replace('2B4a12345Z2lr7','^.*a([0123456789]+)Z.*$','\1',1,1)
    target
    from dual
    /

    12:09:39 ordevdb01.radisys.com - js001292_at_dv11 SQL> /

    SOURCE TARGE TARGE
    -------------- ----- -----
    2B4a12345Z2lr7 12345 12345

    1 row selected.


    Jared Still
    Certifiable Oracle DBA and Part Time Perl Evangelist
    Oracle Blog: http://jkstill.blogspot.com
    Home Page: http://jaredstill.com


    On Fri, Oct 23, 2009 at 11:01 AM, Michael Moore wrote:

    First of all I would like to say that finding detailed information about
    Oracle's implementation of regular expressions is next to impossible. Either
    that, or I am looking in all the wrong places. So, links to any complete
    online documentation would be appreciated. I found plenty of 'basic
    explanation' but there has got to be more.

    Here is the specific problem I am trying to solve using REGEXP_SUBSTR. I
    have a string that might look something like this:

    '2B4a12345Z2lr7' . I want to extract the 12345. In words, find the first
    letter 'a' and return all the digits up to the letter Z.

    There can be any number of random non-'a' characters prior to the initial
    'a' and likewise after the letter Z. In other words, in my example, '2B4'
    just represents some random sequence of characters, as does '2lr7'.

    I could easily use INSTR and SUBSTR to do this, but my goal is to solve
    the problem using a single REGEXP_SUBSTR statement. Also, I have used the
    pattern 'a[[:digit:]]+' which results in 'a12345'. This is almost right, but
    I don't want the 'a' to be returned.

    Thanks,
    Mike
    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedOct 23, '09 at 6:01p
activeOct 25, '09 at 7:44p
posts7
users4
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase