FAQ
Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx

Search Discussions

  • Defn noob at Jul 5, 2008 at 9:07 am
    just crawling is supereasy. its how to index and search that is hard.
    just start at yahoo.com, scrape out all the links and then for every
    site visit every link.
    i wrote a crawler in 15 lines of code. but then it all it did was
    visit the sites, not indexing them or anything.

    you could write a faster one in C++ probably but if you are new to it
    doing it in python will let you experiment and learn faster.

    some links:
    http://infolab.stanford.edu/~backrub/google.html
    http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html



    http://www.example-code.com/python/pythonspider.asp
    http://www.example-code.com/python/spider_simpleCrawler.asp
  • Subeen at Jul 6, 2008 at 10:32 am

    On Jul 5, 2:31?pm, disappeare... at gmail.com wrote:
    Hi
    Does anyone here have a good recommendation for an open source crawler
    that I could get my hands on? It doesn't have to be python based. I am
    interested in learning how crawling works. I think python based
    crawlers will ensure a high degree of flexibility but at the same time
    I am also torn between looking for open source crawlers in python vs C
    ++ because the latter is much more efficient(or so I heard. I will be
    crawling on very cheap hardware.)

    I am definitely open to suggestions.

    Thx
    You can check my python blog. There are some tips and codes on
    crawlers.
    http://love-python.blogspot.com/

    regards,
    Subeen
    http://love-python.blogspot.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJul 5, '08 at 8:31a
activeJul 6, '08 at 10:32a
posts3
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase