2. If I choose to sort the results by date, then recent documents with
very very low relevancy (say the words searched appears only in
content, and not in title/bylines/summary fields that are boosted
higher) are still shown relatively high in the list, and I wish to
omit them in general. What is the best way to implement some sort of a
relevancy filter (include only documents with an normalized score of
0.2 or more....)? Or is there a better way around it?
As Chris pointed out, there isn't always an easy way to do this. Your
suggestion of filtering below normalized scores of 0.2 might work,
assuming the most relevant document is 1.0. You'll have to tune this
cutoff point and see how well it works. One thing to watch out for is
that if the raw (non-normalized) score is less than 1.0, it is not
"normalized", so your most relevant document can have a score of less
than 1.0. This may or may not be what you want, just something to
consider. Lucene's Hits.java is where the normalization happens.


To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupjava-user @
postedFeb 10, '06 at 7:26a
activeFeb 10, '06 at 8:09a



site design / logo © 2021 Grokbase