is version 2.1.0 (rather than 2.2.0 as described in NUTCH-507)?
Thanks.
On Wed, Feb 20, 2008 at 5:58 PM, John Mendenhall wrote:
08/02/20 15:38:09 WARN crawl.Generator: Generator: 0 records selected
for fetching, exiting ...
08/02/20 15:38:09 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to fetch.
08/02/20 15:38:09 WARN crawl.Crawl: No URLs to fetch - check your seed
list and URL filters.
I've inserted code at Generator.java:424, which says:
if (readers == null || readers.length == 0 || !readers[0].next(new
FloatWritable())) {
LOG.warn("Generator: 0 records selected for fetching, exiting ...");
essentially at the decision point to see which of the conditions
triggered the 0 records selected message, and the "readers" object is
perfectly fine, but the SequenceFileOutputFormat is reporting there
are no values (I suppose of URL scores) at all to be retrieved,
causing the generator to stop.
There is a problem with the Generator. There was a change committed08/02/20 15:38:09 WARN crawl.Generator: Generator: 0 records selected
for fetching, exiting ...
08/02/20 15:38:09 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to fetch.
08/02/20 15:38:09 WARN crawl.Crawl: No URLs to fetch - check your seed
list and URL filters.
I've inserted code at Generator.java:424, which says:
if (readers == null || readers.length == 0 || !readers[0].next(new
FloatWritable())) {
LOG.warn("Generator: 0 records selected for fetching, exiting ...");
essentially at the decision point to see which of the conditions
triggered the 0 records selected message, and the "readers" object is
perfectly fine, but the SequenceFileOutputFormat is reporting there
are no values (I suppose of URL scores) at all to be retrieved,
causing the generator to stop.
after 0.9 was released. I implemented this change and it fixed my
problem:
http://www.mail-archive.com/nutch-commits@lucene.apache.org/msg01991.html
JohnM
--
john mendenhall
john@surfutopia.net
surf utopia
internet services