FAQ
Hi all,
I'd like to do a very simple change to the idf computation, but I can't seem
to wrap my head around it.

There are very useful hints in the javadocs for "Changing Similarity" for
new tf() and lengthNorm() behavior, but it was a little bit blurrier for
idf()
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/package-summary.html#changingSimilarity

I'd like to use something beyond the global numDocs.
I'd like to have a modified idf() that gives me the inverse frequency in a
*subset* of the index (e.g. for a specific type of document). I have the
type stored in a field, and I'd need to count how many documents contain
that type for a given term. Since IDF takes the numDocs as a parameter, I
could just change the class that calls idf() and pass the number I need? Who
class calls idf()? TermQuery? So should I make the changes there? Or in
TermScorer?

Anybody has some light to shed on this issue?

Thanks in advance,
Pablo

[1]
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/DefaultSimilarity.html#idf%28int,%20int%29

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 30, '10 at 12:00p
activeJul 30, '10 at 12:00p
posts1
users1
websitelucene.apache.org

1 user in discussion

Pablo Mendes: 1 post

People

Translate

site design / logo © 2023 Grokbase