FAQ
I'm using Linux - Manriva LE2005, python 2.3 (or i can also use python 2.4
on my other system just as well).
Anyways...
I want to get a web page containing my stock grants.
The initial page is an https and there is a form on it to
fill in your username and password and then click "login"
I played with python's urlopen and basically it complains "your browser
doesnt support frames" meaning the urlopen call makes it unhappy somehow.
Is it reasonable to think i can build a script to login to this secure
website, move to a different page (on that site) and download it to disk?
Or am i just looking at a ling complicated task.
I'd really like to get the page because then i can analyze it from a cron
job and email myself my current options value each week or each month.
Thanks
Eric

Search Discussions

  • Ncf at Jul 19, 2005 at 6:40 am
    It might be checking the browser's User-agent. My best bet for you
    would to be to use something to record the headers your browser sends
    out, and mimic those in Python.

    If you look at the source code for urlopener (I think you can press
    Alt+M and type in "urlopener"), under the FancyURLopener definition,
    you should see something like self.add_headers (not on a box to check
    it right now, but it's in the constructer, I remember that much).

    Just set all the headers to send out (like your browser would) by
    setting that value from your script. i.e.:

    import urlopener
    urlopener = FancyURLopener()
    urlopener.add_headers =
    [('User-agent','blah'),('Header2','val'),('monkey','bone')]
    # do the other stuff here :P

    HTH

    -Wes
  • Mike Meyer at Jul 20, 2005 at 3:13 am

    Eric <BorgMotherShip at AliensR_US.org> writes:

    I'm using Linux - Manriva LE2005, python 2.3 (or i can also use python 2.4
    on my other system just as well).
    Anyways...
    I want to get a web page containing my stock grants.
    The initial page is an https and there is a form on it to
    fill in your username and password and then click "login"
    I played with python's urlopen and basically it complains "your browser
    doesnt support frames" meaning the urlopen call makes it unhappy somehow.
    Is it reasonable to think i can build a script to login to this secure
    website, move to a different page (on that site) and download it to disk?
    Or am i just looking at a ling complicated task.
    It's not that bad. It took me about half a day to do this for a site I
    wanted scraped regularly, and what I had to do was much more
    complicated than what you describe. I had to deal with an optional
    second login page (a "security feature" of the site), http-equiv
    redirects (which urlopen doesn't handle), and then digging the URL of
    the page I wanted to get information from from the resulting page.

    The complaint about your browser may be their inadequate attempt to
    deal with browser portability by putting that on the resulting framed
    page in the NOFRAMES element. In which case, you just need to find the
    URL for the frame that's got the information you want, and get that
    page. On the other hand, as Wes said, they may be browser-sniffing. In
    which case you'll have to set the User-Agent to something they won't
    complain about. Personally, I always try "Your Web Site Developer
    Sucks" to see if they have a list of disallowed browsers. If that
    fails, try the User-Agent string of a well-known browser.

    For page scraping, install BeautifulSoup.

    <mike
    --
    Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJul 19, '05 at 1:18a
activeJul 20, '05 at 3:13a
posts3
users3
websitepython.org

3 users in discussion

Ncf: 1 post Mike Meyer: 1 post Eric: 1 post

People

Translate

site design / logo © 2022 Grokbase