FAQ
After I've run the re.search function on a string and no match was
found, how can I access that string? When I try to print it directly,
it's an empty string, I assume because it has been "consumed." How do
I prevent this?

It seems to work fine for this 2.x code:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
page = urllib.request.urlopen(pc_url + next_nothing)
match_obj = pattern.search(page.read().decode())
if match_obj:
next_nothing = match_obj.group()
print(next_nothing)
else:
print(page.read().decode())
break

But when I try it with my own code (3.2), it won't print the text of
the page:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
page = urllib.request.urlopen(pc_url + next_nothing)
match_obj = pattern.search(page.read().decode())
if match_obj:
next_nothing = match_obj.group()
print(next_nothing)
else:
print(page.read().decode())
break

P.S. I plan to clean up my code, I know it's not great right now. But
my immediate goal is to just figure out why the 2.x code can print
"text", but my own code can't print "page," which are basically the
same thing, unless something significant has changed with either the
urllib.request module, or the way it's decoded, or something, or is it
just an RE issue?

Thanks.

Search Discussions

  • Ian Kelly at Jun 23, 2011 at 8:47 pm

    On Thu, Jun 23, 2011 at 1:58 PM, John Salerno wrote:
    After I've run the re.search function on a string and no match was
    found, how can I access that string? When I try to print it directly,
    it's an empty string, I assume because it has been "consumed." How do
    I prevent this?
    This has nothing to do with regular expressions. It would appear that
    page.read() is letting you read the response body multiple times in
    2.x but not in 3.x, probably due to a change in buffering. Just store
    the string in a variable and avoid calling page.read() multiple times.
  • John Salerno at Jun 23, 2011 at 9:14 pm

    On Jun 23, 3:47?pm, Ian Kelly wrote:
    On Thu, Jun 23, 2011 at 1:58 PM, John Salerno wrote:
    After I've run the re.search function on a string and no match was
    found, how can I access that string? When I try to print it directly,
    it's an empty string, I assume because it has been "consumed." How do
    I prevent this?
    This has nothing to do with regular expressions. It would appear that
    page.read() is letting you read the response body multiple times in
    2.x but not in 3.x, probably due to a change in buffering. ?Just store
    the string in a variable and avoid calling page.read() multiple times.
    Thank you. That worked, and as a result I think my code will look
    cleaner.
  • Thomas L. Shinnick at Jun 23, 2011 at 9:47 pm
    There is also
    print(match_obj.string)
    which gives you a copy of the string searched. See end of section
    6.2.5. Match Objects
    At 02:58 PM 6/23/2011, John Salerno wrote:
    After I've run the re.search function on a string and no match was
    found, how can I access that string? When I try to print it directly,
    it's an empty string, I assume because it has been "consumed." How do
    I prevent this?

    It seems to work fine for this 2.x code:

    import urllib.request
    import re

    next_nothing = '12345'
    pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
    nothing='
    pattern = re.compile(r'[0-9]+')

    while True:
    page = urllib.request.urlopen(pc_url + next_nothing)
    match_obj = pattern.search(page.read().decode())
    if match_obj:
    next_nothing = match_obj.group()
    print(next_nothing)
    else:
    print(page.read().decode())
    break

    But when I try it with my own code (3.2), it won't print the text of
    the page:

    import urllib.request
    import re

    next_nothing = '12345'
    pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
    nothing='
    pattern = re.compile(r'[0-9]+')

    while True:
    page = urllib.request.urlopen(pc_url + next_nothing)
    match_obj = pattern.search(page.read().decode())
    if match_obj:
    next_nothing = match_obj.group()
    print(next_nothing)
    else:
    print(page.read().decode())
    break

    P.S. I plan to clean up my code, I know it's not great right now. But
    my immediate goal is to just figure out why the 2.x code can print
    "text", but my own code can't print "page," which are basically the
    same thing, unless something significant has changed with either the
    urllib.request module, or the way it's decoded, or something, or is it
    just an RE issue?

    Thanks.
  • John Salerno at Jun 23, 2011 at 10:02 pm

    On Jun 23, 4:47?pm, "Thomas L. Shinnick" wrote:
    There is also
    ? ? ? ?print(match_obj.string)
    which gives you a copy of the string searched. ?See end of section
    6.2.5. Match Objects
    I tried that, but the only time I wanted the string printed was when
    there *wasn't* a match, so the match object was a NoneType.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJun 23, '11 at 7:58p
activeJun 23, '11 at 10:02p
posts5
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase