FAQ
Hi everyone, new to python. I'm attempting to download a large amount
of webpages (about 600) to disk and for some reason a few of them
fail.

I'm using this in a loop where pagename and urlStr change each time:
import urllib
try:
urllib.urlretrieve(urlStr, 'webpages/'+pagename+'.htm')
except IOError:
print 'Cannot open URL %s for reading' % urlStr
str1 = 'error!'

Out of all the webpages, it does not work for these three:
http://exoplanet.eu/planet.php?p1=WASP-11/HAT-P-10&p2=b
http://exoplanet.eu/planet.php?p1=HAT-P-27/WASP-40&p2=b
http://exoplanet.eu/planet.php?p1=HAT-P-30/WASP-51&p2=b
giving "Cannot open URL http://exoplanet.eu/planet.php?p1=WASP-11/HAT-P-10&p2=b
for reading" etc.

however copying and pasting the URL from the error message
successfully opens in firefox

it successfully downloads the 500 or so other pages such as:
http://exoplanet.eu/planet.php?p1=HD+88133&p2=b

I guess it has something to do with the forward slash in the names
(e.g. HAT-P-30/WASP-51 compared to HD+88133 in the examples above)

Is there a way I can fix this? Thanks.

Search Discussions

  • TimB at Jul 6, 2011 at 7:43 am

    On Jul 6, 5:39?pm, TimB wrote:
    Hi everyone, new to python. I'm attempting to download a large amount
    of webpages (about 600) to disk and for some reason a few of them
    fail.

    I'm using this in a loop where pagename and urlStr change each time:
    import urllib
    ? ? try:
    ? ? ? ? urllib.urlretrieve(urlStr, 'webpages/'+pagename+'.htm')
    ? ? except IOError:
    ? ? ? ? print 'Cannot open URL %s for reading' % urlStr
    ? ? ? ? str1 = 'error!'

    Out of all the webpages, it does not work for these three:http://exoplanet.eu/planet.php?p1=WASP-11/HAT-P-10&p2=bhttp://exoplanet.eu/planet.php?p1=HAT-P-27/WASP-40&p2=bhttp://exoplanet.eu/planet.php?p1=HAT-P-30/WASP-51&p2=b
    giving "Cannot open URLhttp://exoplanet.eu/planet.php?p1=WASP-11/HAT-P-10&p2=b
    for reading" etc.

    however copying and pasting the URL from the error message
    successfully opens in firefox

    it successfully downloads the 500 or so other pages such as:http://exoplanet.eu/planet.php?p1=HD+88133&p2=b

    I guess it has something to do with the forward slash in the names
    (e.g. HAT-P-30/WASP-51 compared to HD+88133 in the examples above)

    Is there a way I can fix this? Thanks.
    sorry, I was attempting to save the page to disk with the forward
    slash in the name, disreguard

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJul 6, '11 at 7:39a
activeJul 6, '11 at 7:43a
posts2
users1
websitepython.org

1 user in discussion

TimB: 2 posts

People

Translate

site design / logo © 2022 Grokbase