I have been trying to solve a similar problem where I need to boost
score based on the document type. Your approach is very interesting
and I want to give it a try.

I have a implementation specific question. When you mention to put as
many "1" as the boost need to be, do you mean that the resultant field
should look like "1 1 1 1 1" or "1,1,1,1,1" so that the content is
tokenized and indexed?

On 12/21/06, Andrzej Bialecki wrote:
Martin Braun wrote:
Hi Daniel,

so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should
get a boost of 1.1975 .
The boost is stored with a limited resolution. Try boosting one doc by 10,
the other one by 20 or something like that.
You're right. I thought that with the float values the resolution should
be good enough!
But there is only a difference in the score with a boosting diff of 0.2
(e.g. 1.7 and 1.9).

I know that there were many questions on the list regarding scoring
better new documents.
But I want to avoid any overhead like "FunctionQuery" at query time,
and in my case I have some documents
which have same values in many fields (=>same score) and the only
difference is the year.

However I don't want to overboost the score so that the scoring for
other criteria is not considered.

Shortly spoken: As a result of a search I have a list of book titles and
I want a sort by score AND by year of publication.

But for performance reasons I want to avoid this sorting at query-time
by boosting at index time.

Is that possible?
Here's the trick that works for me, without the issues of boost
resolution or FunctionQuery.

Add a separate field, say "days", in which you will put as many "1" as
many days elapsed since the epoch (not neccessarily since 1 Jan 1970 -
pick a date that makes sense for you). Then, if you want to prioritize
newer documents, just add "+days:1" to your query. Voila - the final
results are a sum of other score factors plus a score factor that is
higher for more recent document, containing more 1-s.

If you are dealing with large time spans, you can split this into years
and days-in-a-year, and apply query boosts, like "+years:1^10.0
+days:1^0.02". Do some experiments and find what works best for you.

Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 7 | next ›
Discussion Overview
groupjava-user @
postedDec 20, '06 at 4:32p
activeDec 26, '06 at 7:19p



site design / logo © 2022 Grokbase