Hi all,

I have a question concerning updating a site's score in Nutch 1.2.

In org.apache.nutch.crawlCrawlDbReducer's reduce-method I found a call to
scfilters.updateDbScore((Text)key, oldSet ? old : null, result, linkList);

During debugging, I discovered that this method is executed in the org.apache.nutch.scoring.opic.OPICScoringFilter class. The code for this method is the following:
/** Increase the score by a sum of inlinked scores. */
public void updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List inlinked) throws ScoringFilterException {
float adjust = 0.0f;
for (int i = 0; i < inlinked.size(); i++) {
CrawlDatum linked = (CrawlDatum)inlinked.get(i);
adjust += linked.getScore();
if (old == null) old = datum;
datum.setScore(old.getScore() + adjust);

To my understanding, this code would increase a sites score based on it's inlinks, every time a site is crawled. So even if neither the site has been modified, nor any new inlink was discovered, the sites score will increase.

Is my understanding of this mechanism correct?
If so, could anyone explain to me why a sites score is increased in any case? I would expect it to only change if either its content has changed, or a new inlink has been discovered.


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 6 | next ›
Discussion Overview
groupuser @
categoriesnutch, lucene
postedFeb 2, '11 at 12:19p
activeFeb 7, '11 at 7:02a



site design / logo © 2022 Grokbase