On Sun, Dec 23, 2007 at 2:15 AM, mark harwood wrote:Thanks for the context - much more useful.
The challenge here is similar to that posed by offering end-user tagging
of content (see here
http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
The main difference here being that words are added to docs implicitly by
search click-throughs rather than any explicit tagging action.
In both cases the challenge is that the user data around documents is
likely to be updated very often while the documents remain relatively
static.
I suspect some additional things to think about are:
1) Cancelling out the "human laziness" bias that favours clicking results
on page 1. Are clicks on page 2 worth more?
2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
3) Lucene doc IDs are not stable - how will you associate query
terms/click data with documents and join them at speed?
4) Are individual words or phrases the unit of boost? "Paris" means
different things in "Paris Hilton" and "Paris, France".
A simple approach might be to re-index your content with all of the
additional search terms from clicks added to the associated document in a
"searchClicks" field - the more clicks, the more repetitions of the same
search words in the document to help with tf (Term Frequency). This
additional content would need to be capped, to avoid huge documents. This
has the disadvantage of requiring a re-index though.
Another option to avoid reindexing everything is to wrap IndexReader (See
FilterIndexReader) and implement TermEnum/TermDocs for a fake field called
"searchClicks". The idea is Lucene looks after the usual, static document
content while your implementation goes off to your more volatile storage (
e.g. database/parallel index, custom file structure) to retrieve lists of
doc ids, term frequencies etc. for this "searchClicks" field. All of the
Lucene queries you might want to throw at this e.g. PhraseQueries can then
test both the static Lucene fields and your new volatile "click" fields
without being aware of this low-level trickery.
I'm sure there will be other ways of doing this too but this seems like a
conceptually clean way of modelling it - just seeing search terms as
extensions to the document content.
Cheers
Mark
----- Original Message ----
From: sumittyagi <ping.sumit@gmail.com>
To: java-user@lucene.apache.org
Sent: Sunday, 23 December, 2007 5:30:55 AM
Subject: Re: Which file in the lucene package is used to manipulate
results..
Actually what i have to do is...
1.) for every query(keyword), among the results obtained, the keyword
will
be mapped with the page clicked, along with the no. of clicks for that
keyword on that page
2.) next time for the same query(keyword), the mapped pages will be
ranked
higher considering the no. of clicks too..
3.) for every new query these steps will be repeated...
this was a very high level view , i have made algorithms for these
modules
and trying to incorporate with lucene but dont know , on which files i
have
to do edition to make it work...
please help me regarding this, if you need some more explanation,
please let
me know...
thanks
Sumit Tyagi
Erick Erickson wrote:
You still haven't explained *why* you want to rerank results. What
is the use-case you're trying to implement? Quite often it's turned
out for me that when I let folks on the list know what the use
case I'm trying to support is, they come up with much more elegant
solutions than I was thinking about.
For instance, does the CustomScoreQuery class have any relevance
to your problem?
If you're thinking of modifying the core Lucene code for your
special purpose, I'd advise against it unless and until you'd exhausted
all the other options. It's always a maintenance headache to do this.
Best
Erick
On Dec 21, 2007 10:09 AM, sumittyagi wrote:
actually i am writing a module to rerank the results, so i want to
edit
the
file which arrange the results and give them ranks,
or is there any other way i can use my module to rerank the results
markharw00d wrote:
I think you need to describe your "factors" in more detail.
Exactly
what
do you want to achieve for your users?
We could be talking about any number of Lucene functions here.
----- Original Message ----
From: sumittyagi <ping.sumit@gmail.com>
To: java-user@lucene.apache.org
Sent: Friday, 21 December, 2007 4:51:09 AM
Subject: Which file in the lucene package is used to manipulate results..
hi, i am using lucene for the very first time and want to
manipulate
the
results, by adding some more factors to it, which file should i
edit to
manipulate the search results....
Thanks
Sumit Tyagi
--
View this message in context:
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.htmlSent from the Lucene - Java Users mailing list archive at
Nabble.com.
__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox
http://uk.mail.yahoo.com---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
--
View this message in context:
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.htmlSent from the Lucene - Java Users mailing list archive at
Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
--
View this message in context:
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox
http://uk.mail.yahoo.com---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org