FAQ
Hello,

I have a question regarding the return value of re.split() since I have
been unable to find any answers in the regular sources of documentation.

Please consider the following:

#!/usr/bin/env python

import re

if __name__ == "__main__":
datum = "2008-03-14"
the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
print the_date

Now the result that is printed is:
['', '2008', '03', '14', '']

My question: what are the empty strings doing there in the beginning and
in the end ? Is this due to a faulty regular expression ?

Thank you !

KL.

Search Discussions

  • Diez B. Roggisch at Mar 21, 2008 at 3:30 pm

    klaus schrieb:
    Hello,

    I have a question regarding the return value of re.split() since I have
    been unable to find any answers in the regular sources of documentation.

    Please consider the following:

    #!/usr/bin/env python

    import re

    if __name__ == "__main__":
    datum = "2008-03-14"
    the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
    print the_date

    Now the result that is printed is:
    ['', '2008', '03', '14', '']

    My question: what are the empty strings doing there in the beginning and
    in the end ? Is this due to a faulty regular expression ?
    Read the manual:

    """
    split( pattern, string[, maxsplit = 0])
    Split string by the occurrences of pattern. If capturing
    parentheses are used in pattern, then the text of all groups in the
    pattern are also returned as part of the resulting list. If maxsplit is
    nonzero, at most maxsplit splits occur, and the remainder of the string
    is returned as the final element of the list. (Incompatibility note: in
    the original Python 1.5 release, maxsplit was ignored. This has been
    fixed in later releases.)

    """

    The Key issue here being "If capturing parentheses are used in pattern,
    then the text of all groups in the pattern are also returned as part of
    the resulting list."

    Consider this:
    re.compile("a").split("bab")
    ['b', 'b']
    re.compile("(a)").split("bab")
    ['b', 'a', 'b']
    >>>

    Consider using match or search if split isn't what you actually want.

    Diez
  • Tim Chase at Mar 21, 2008 at 3:31 pm

    datum = "2008-03-14"
    the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
    print the_date

    Now the result that is printed is:
    ['', '2008', '03', '14', '']

    My question: what are the empty strings doing there in the beginning and
    in the end ? Is this due to a faulty regular expression ?

    I think in this case, you just want the standard string .split()
    method:

    the_date = datum.split('-')

    which will return you

    ['2008', '03', '14']

    The re.split() splits your string using your regexp as the way to
    find the divider. It finds emptiness before, emptiness after,
    and returns the tagged matches for each part. It would be similar to
    s = ','
    s.split(',')
    ['', '']

    only you get your tagged matches in there too. Or, if you need
    more precision in your matching (in your case, ensuring that
    they're digits, and with the right number of digits), you can do
    something like
    r = re.compile('^([0-9]{4})-([0-9]{2})-([0-9]{2})$')
    m = r.match(datum)
    m.groups()
    ('2008', '03', '14')

    -tkc
  • Klaus at Mar 21, 2008 at 3:53 pm
    On Fri, 21 Mar 2008 10:31:20 -0500, Tim Chase wrote:

    <..........>

    Ok thank you !

    I think I got a bit lost in all the possibilities python has to offer.
    But your answers did the trick.

    Thank you all again for responding and elaborating.

    Cheers,

    KL.
  • John Machin at Mar 21, 2008 at 8:34 pm

    On Mar 22, 2:53 am, klaus wrote:
    On Fri, 21 Mar 2008 10:31:20 -0500, Tim Chase wrote:

    <..........>

    Ok thank you !

    I think I got a bit lost in all the possibilities python has to offer.
    IMHO you got more than a bit lost. You seem to have stumbled on a
    possibly unintended side effect of re.split.

    What is your underlying goal?

    If you want merely to split on '-', use datum.split('-').

    If you want to verify the split results as matching patterns (4
    digits, 2 digits, 2 digits), use something like this:
    import re
    datum = '2008-03-14'
    pattern = r'^(\d\d\d\d)-(\d\d)-(\d\d)\Z'
    You may notice two differences between my pattern and yours ...
    mobj = re.match(pattern, datum)
    mobj.groups()
    ('2008', '03', '14')
    But what are you going to do with the result? If the resemblance
    between '2008-03-14' and a date is not accidental, you may wish to
    consider going straight from a string to a datetime or date object,
    e.g.
    import datetime
    dt = datetime.datetime.strptime(datum, '%Y-%m-%d')
    dt
    datetime.datetime(2008, 3, 14, 0, 0)
    d = datetime.datetime.date(dt)
    d
    datetime.date(2008, 3, 14)
    HTH,
    John
  • Klaus at Mar 24, 2008 at 8:11 pm

    On Fri, 21 Mar 2008 13:34:27 -0700, John Machin wrote:
    On Mar 22, 2:53 am, klaus wrote:
    On Fri, 21 Mar 2008 10:31:20 -0500, Tim Chase wrote:

    <..........>

    Ok thank you !

    I think I got a bit lost in all the possibilities python has to offer.
    IMHO you got more than a bit lost. You seem to have stumbled on a
    possibly unintended side effect of re.split.

    What is your underlying goal?

    If you want merely to split on '-', use datum.split('-').

    If you want to verify the split results as matching patterns (4 digits,
    2 digits, 2 digits), use something like this:
    import re
    datum = '2008-03-14'
    pattern = r'^(\d\d\d\d)-(\d\d)-(\d\d)\Z' You may notice two
    differences between my pattern and yours ...
    mobj = re.match(pattern, datum) | >>> mobj.groups()
    ('2008', '03', '14')
    But what are you going to do with the result? If the resemblance between
    '2008-03-14' and a date is not accidental, you may wish to consider
    going straight from a string to a datetime or date object, e.g.
    import datetime
    dt = datetime.datetime.strptime(datum, '%Y-%m-%d')
    dt
    datetime.datetime(2008, 3, 14, 0, 0)
    d =
    datetime.datetime.date(dt)
    d
    datetime.date(2008, 3, 14)
    HTH,
    John
    Ok, sorry for my late reply. I got caught up in a fight with easterbunnys
    over some extraordinary large, fruitty and fertile eggs. Some creatures
    take Easter just to serious and it is not even mating season ! Can you
    believe that ?

    :-)

    Anyway, the underlying goal was to verify user input and to split up the
    date so that I could easily convert it to another format. Among others,
    an url and for a database querry. And I have succeeded in that.

    Thank you again; for taking the time to explain - and to question.

    KL.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMar 21, '08 at 3:19p
activeMar 24, '08 at 8:11p
posts6
users4
websitepython.org

People

Translate

site design / logo © 2022 Grokbase