FAQ
Hi,
I would like to write a code that needs to crawl an url and take all the
HTML code. I have noticed that there are different opensource webcrawlers,
but they are very extensive for what I need. I only need to crawl an url,
and don't know if it is so easy as using an html parser. Is it? Which
libraries would you recommend me?
Thanks!!
Fabian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-list/attachments/20071112/176a6cb2/attachment.htm

Search Discussions

  • Adam Pletcher at Nov 12, 2007 at 7:36 pm
    In the standard Python install (Windows 2.5, at least), there's there's a couple example scripts you might find useful:



    <python>\Tools\webchecker\webchecker.py

    Crawls specified URL, checking for broken links.



    <python>\Tools\webchecker\websucker.py

    Variant on the above that archives the specified site locally. Including images, but you could probably limit it to HTML easily enough.



    I haven't used either extensively, but they appear to work as advertised. It should be easy to modify one and tie it into the MySQLdb extensions:

    http://sourceforge.net/projects/mysql-python



    --

    Adam Pletcher

    Technical Art Director

    Volition/THQ <http://www.volition-inc.com/>



    From: python-list-bounces+adam=volition-inc.com at python.org [mailto:python-list-bounces+adam=volition-inc.com at python.org] On Behalf Of Fabian L?pez
    Sent: Monday, November 12, 2007 12:33 PM
    To: Python-list at python.org
    Subject: crawler in python and mysql



    Hi,
    I would like to write a code that needs to crawl an url and take all the HTML code. I have noticed that there are different opensource webcrawlers, but they are very extensive for what I need. I only need to crawl an url, and don't know if it is so easy as using an html parser. Is it? Which libraries would you recommend me?
    Thanks!!
    Fabian

    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://mail.python.org/pipermail/python-list/attachments/20071112/d6126978/attachment-0001.htm

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedNov 12, '07 at 6:32p
activeNov 12, '07 at 7:36p
posts2
users2
websitepython.org

2 users in discussion

Fabian López: 1 post Adam Pletcher: 1 post

People

Translate

site design / logo © 2023 Grokbase