FAQ
Hi everyone,

I want to save a web page. I use urllib to parse the web page. But I
find the saved file, where some content is missing. The missing part
is block from the original web page, such as this part <div
style="display: block;" id="GeneInts">...</div>.I don't know how to
parse a whole page without something block in it. Could you help me
figure it out? Thank you!


This is my program

url = 'http://receptome.stanford.edu/hpmr/SearchDB/getGenePage.asp?
ParamE02931&ProtId=1&ProtType=Receptor'
f = urllib.urlretrieve(url,'test.html')

Search Discussions

  • Daniel Fetchinson at Aug 10, 2010 at 3:04 pm

    I want to save a web page. I use urllib to parse the web page. But I
    find the saved file, where some content is missing. The missing part
    is block from the original web page, such as this part <div
    style="display: block;" id="GeneInts">...</div>.I don't know how to
    parse a whole page without something block in it. Could you help me
    figure it out? Thank you!


    This is my program

    url = 'http://receptome.stanford.edu/hpmr/SearchDB/getGenePage.asp?
    ParamE02931&ProtId=1&ProtType=Receptor'
    f = urllib.urlretrieve(url,'test.html')
    A web server may present different output depending on the client
    used. When you use your browser to look at the source and then use
    urllib's saved file you access the web server with different clients.
    I'm not saying this is your problem, but potentially it is.

    So you might want to make urllib appear as a browser by sending the
    appropriate headers.

    HTH,
    Daniel



    --
    Psss, psss, put it down! - http://www.cafepress.com/putitdown
  • Lawrence D'Oliveiro at Aug 12, 2010 at 7:39 am
    In message <mailman.1921.1281452652.1673.python-list at python.org>, Daniel
    Fetchinson wrote:
    A web server may present different output depending on the client
    used.
    It may also require execution of some JavaScript to insert HTML content.
    So you might want to make urllib appear as a browser by sending the
    appropriate headers.
    If the above is the case, then this won?t be enough.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedAug 10, '10 at 2:02p
activeAug 12, '10 at 7:39a
posts3
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase