hi, Jiaqi Tan & John Mendenhall

i have encountered the same problem, i have tried

correct the log4j bug and


already, and it still did not work, i was working on a cluster of 4 boxes with redhat as4

i also checked the hadoop.log and found nothing more important

so i think the problem was the generator, and i saw someone said it might caused by setting bad mapred.map.tasks and mapred.reduce.tasks, i had 4 PCs and followed the explanation of mapred.map.tasks and mapred.reduce.tasks, i set 17 and 7, was it right? can someone help me?


08/02/20 15:38:09 WARN crawl.Generator: Generator: 0 records selected
for fetching, exiting ...
08/02/20 15:38:09 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to fetch.
08/02/20 15:38:09 WARN crawl.Crawl: No URLs to fetch - check your seed
list and URL filters.

I've inserted code at Generator.java:424, which says:
if (readers == null || readers.length == 0 || !readers[0].next(new
FloatWritable())) {
LOG.warn("Generator: 0 records selected for fetching, exiting ...");

essentially at the decision point to see which of the conditions
triggered the 0 records selected message, and the "readers" object is
perfectly fine, but the SequenceFileOutputFormat is reporting there
are no values (I suppose of URL scores) at all to be retrieved,
causing the generator to stop.
There is a problem with the Generator. There was a change committed
after 0.9 was released. I implemented this change and it fixed my



john mendenhall
surf utopia
internet services
= = = = = = = = = = = = = = = = = = = =

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupnutch-user @
postedFeb 23, '08 at 3:56a
activeFeb 24, '08 at 9:28p

2 users in discussion

Jiaqi Tan: 1 post Ivannie: 1 post



site design / logo © 2022 Grokbase