I am able to crawl so many site without any issue but when I am crawling
it is stopping at depth=1.
I am using "bin/nutch crawl urls/magicbricks/url.txt -dir crawl/magicbricks
-threads 10 -depth 3 -topN 10"
But if I put links like "http://www.magicbricks.com/bricks/cityIndex.html"
or "http://www.magicbricks.com/bricks/propertySearch.html" in
urls/magicbricks/url.txt it crawls without any issue.
In robots.txt I have allowed my crawler named Propertybot all access to
crawl, which can be seen by using http://magicbricks.com/robots.txt
Please suggest what can be the reasons, why it is happening.
Thanks in advance
View this message in context: http://lucene.472066.n3.nabble.com/Can-t-Crawl-Through-Home-Page-but-crawling-through-inner-page-tp2601843p2601843.html
Sent from the Nutch - User mailing list archive at Nabble.com.