FAQ
Hi All,

I am trying to crawl BBC Hindi site "http://www.bbc.co.uk/hindi/ "
but after depth 1 it shows, stopping at depth-1, no more urls to fetch.

Looking at the dump for depth-1, I realised there is no content fetched from
the page, could any one help me to figure out the root cause of the problem,
why it's not fetching any content from the page?

Had any one tried to crawl the site http://www.bbc.co.uk/hindi/ ??


thanks in advance

--
Ankur Garg
अँकुर गर्ग

Search Discussions

  • Yanky young at Apr 6, 2009 at 2:58 pm
    Hi:

    if you just use nutch crawl command, you should put your domain names
    in crawl-urlfilter.txt

    like this:

    +^http://([a-z0-9]*\.)bbc.co.uk/hindi

    or

    +^http://www.bbc.co.uk/hindi

    good luck



    2009/4/6, Ankur Garg <garg.ankur.2005@gmail.com>:
    Hi All,

    I am trying to crawl BBC Hindi site "http://www.bbc.co.uk/hindi/ "
    but after depth 1 it shows, stopping at depth-1, no more urls to fetch.

    Looking at the dump for depth-1, I realised there is no content fetched from
    the page, could any one help me to figure out the root cause of the problem,
    why it's not fetching any content from the page?

    Had any one tried to crawl the site http://www.bbc.co.uk/hindi/ ??


    thanks in advance

    --
    Ankur Garg
    अँकुर गर्ग

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupnutch-user @
categorieslucene
postedApr 6, '09 at 6:12a
activeApr 6, '09 at 2:58p
posts2
users2
websitenutch.apache.org

2 users in discussion

Yanky young: 1 post Ankur Garg: 1 post

People

Translate

site design / logo © 2022 Grokbase