FAQ
hi, i am using lucene for the very first time and want to manipulate the
results, by adding some more factors to it, which file should i edit to
manipulate the search results....

Thanks
Sumit Tyagi
--
View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Search Discussions

  • Mark harwood at Dec 21, 2007 at 9:43 am
    I think you need to describe your "factors" in more detail. Exactly what do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..


    hi, i am using lucene for the very first time and want to manipulate
    the
    results, by adding some more factors to it, which file should i edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.





    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Zhou Qi at Dec 21, 2007 at 10:20 am
    Hi Sumittyagi,

    I think you can implement your factors in the scorer to obtain your
    desired results.

    2007/12/21, mark harwood <markharw00d@yahoo.co.uk>:
    I think you need to describe your "factors" in more detail. Exactly what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..


    hi, i am using lucene for the very first time and want to manipulate
    the
    results, by adding some more factors to it, which file should i edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.





    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Dec 21, 2007 at 3:10 pm
    actually i am writing a module to rerank the results, so i want to edit the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail. Exactly what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..


    hi, i am using lucene for the very first time and want to manipulate
    the
    results, by adding some more factors to it, which file should i edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.





    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Dec 21, 2007 at 3:49 pm
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail. Exactly what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to manipulate
    the
    results, by adding some more factors to it, which file should i edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.





    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Dec 23, 2007 at 5:31 am
    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these modules
    and trying to incorporate with lucene but dont know , on which files i have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation, please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail. Exactly what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to manipulate
    the
    results, by adding some more factors to it, which file should i edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.





    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Dec 21, 2007 at 3:45 pm
    Again, if you could be precise about what factors will influence the ranking that would help. Field names, what is wrong with existing ranking order and some of the thinking about your proposed re-rank logic would be useful context.

    In Lucene you have the options for individual query-clause boosts, index-time document boosts, field-specific boosts on parsers, index-time length normalisation options, query result sorting, IndexSearcher "Similarity" settings and custom scorers to name a few. We can't recommend which approach is most suited unless you can say more about what problem you're trying to address.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 3:09:48 PM
    Subject: Re: Which file in the lucene package is used to manipulate results..


    actually i am writing a module to rerank the results, so i want to edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail. Exactly what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to manipulate
    the
    results, by adding some more factors to it, which file should i edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.





    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Dec 23, 2007 at 10:15 am
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging of content (see here http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ). The main difference here being that words are added to docs implicitly by search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is likely to be updated very often while the documents remain relatively static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
    3) Lucene doc IDs are not stable - how will you associate query terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the additional search terms from clicks added to the associated document in a "searchClicks" field - the more clicks, the more repetitions of the same search words in the document to help with tf (Term Frequency). This additional content would need to be capped, to avoid huge documents. This has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See FilterIndexReader) and implement TermEnum/TermDocs for a fake field called "searchClicks". The idea is Lucene looks after the usual, static document content while your implementation goes off to your more volatile storage (e.g. database/parallel index, custom file structure) to retrieve lists of doc ids, term frequencies etc. for this "searchClicks" field. All of the Lucene queries you might want to throw at this e.g. PhraseQueries can then test both the static Lucene fields and your new volatile "click" fields without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a conceptually clean way of modelling it - just seeing search terms as extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Dec 24, 2007 at 11:13 pm
    hi..
    thanks for the help
    following your suggestions ..
    i do not have the package org.apache.lucene.index , from where can i
    download it to start this project

    markharw00d wrote:
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field called
    "searchClicks". The idea is Lucene looks after the usual, static document
    content while your implementation goes off to your more volatile storage
    (e.g. database/parallel index, custom file structure) to retrieve lists of
    doc ids, term frequencies etc. for this "searchClicks" field. All of the
    Lucene queries you might want to throw at this e.g. PhraseQueries can then
    test both the static Lucene fields and your new volatile "click" fields
    without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14491981.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Dec 24, 2007 at 11:15 pm
    ignore my previous msg... i got that package....

    sumittyagi wrote:
    hi..
    thanks for the help
    following your suggestions ..
    i do not have the package org.apache.lucene.index , from where can i
    download it to start this project

    markharw00d wrote:
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking
    algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field
    called "searchClicks". The idea is Lucene looks after the usual, static
    document content while your implementation goes off to your more volatile
    storage (e.g. database/parallel index, custom file structure) to retrieve
    lists of doc ids, term frequencies etc. for this "searchClicks" field.
    All of the Lucene queries you might want to throw at this e.g.
    PhraseQueries can then test both the static Lucene fields and your new
    volatile "click" fields without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14492040.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Dec 28, 2007 at 5:18 pm
    hi
    which file can i edit to change the scoring factors in lucene results

    markharw00d wrote:
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field called
    "searchClicks". The idea is Lucene looks after the usual, static document
    content while your implementation goes off to your more volatile storage
    (e.g. database/parallel index, custom file structure) to retrieve lists of
    doc ids, term frequencies etc. for this "searchClicks" field. All of the
    Lucene queries you might want to throw at this e.g. PhraseQueries can then
    test both the static Lucene fields and your new volatile "click" fields
    without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14528606.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Dec 28, 2007 at 5:30 pm
    also
    what is the lucene ranking (scoring documents) formula

    sumittyagi wrote:
    hi
    which file can i edit to change the scoring factors in lucene results

    markharw00d wrote:
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking
    algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field
    called "searchClicks". The idea is Lucene looks after the usual, static
    document content while your implementation goes off to your more volatile
    storage (e.g. database/parallel index, custom file structure) to retrieve
    lists of doc ids, term frequencies etc. for this "searchClicks" field.
    All of the Lucene queries you might want to throw at this e.g.
    PhraseQueries can then test both the static Lucene fields and your new
    volatile "click" fields without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14528677.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Steven A Rowe at Dec 28, 2007 at 6:17 pm
    Hi Sumit,

    Here's a good place to start:

    http://lucene.apache.org/java/docs/scoring.html

    Steve
    On 12/28/2007 at 12:30 PM, sumittyagi wrote:

    also
    what is the lucene ranking (scoring documents) formula

    sumittyagi wrote:
    hi which file can i edit to change the scoring factors in lucene results

    markharw00d wrote:
    Thanks for the context - much more useful. The challenge here is
    similar to that posed by offering end-user tagging of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg175
    80.html ).
    The main difference here being that words are added to docs implicitly
    by search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static. I suspect some additional things to think about are: 1)
    Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more? 2) Spam clicks - detecting
    deliberate gaming of your re-ranking algorithm. 3) Lucene doc IDs are
    not stable - how will you associate query terms/click data with
    documents and join them at speed? 4) Are individual words or phrases
    the unit of boost? "Paris" means different things in "Paris Hilton"
    and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document
    in a "searchClicks" field - the more clicks, the more repetitions of
    the same search words in the document to help with tf (Term
    Frequency). This additional content would need to be capped, to avoid
    huge documents. This has the disadvantage of requiring a re-index
    though. Another option to avoid reindexing everything is to wrap
    IndexReader (See FilterIndexReader) and implement TermEnum/TermDocs
    for a fake field called "searchClicks". The idea is Lucene looks after
    the usual, static document content while your implementation goes off
    to your more volatile storage (e.g. database/parallel index, custom
    file structure) to retrieve lists of doc ids, term frequencies etc.
    for this "searchClicks" field. All of the Lucene queries you might
    want to throw at this e.g. PhraseQueries can then test both the static
    Lucene fields and your new volatile "click" fields without being aware
    of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems
    like a conceptually clean way of modelling it - just seeing search
    terms as extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained,
    the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page 2.) next time for the same query(keyword), the
    mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on
    which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What is
    the use-case you're trying to implement? Quite often it's turned out
    for me that when I let folks on the list know what the use case I'm
    trying to support is, they come up with much more elegant solutions
    than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your special
    purpose, I'd advise against it unless and until you'd exhausted all
    the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so
    i want to
    edit
    the file which arrange the results and give them ranks, or is there
    any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ---- From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org Sent: Friday, 21 December, 2007
    4:51:09 AM Subject: Which file in the lucene package is used to
    manipulate results..


    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used
    -to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.






    __________________________________________________________ Sent
    from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
    additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used
    -to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used
    -to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
    additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
    additional commands, e-mail: java-user-help@lucene.apache.org

    -- View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used
    -to-manipulate-results..-tp14450335p14528677.html Sent from the Lucene -
    Java Users mailing list archive at Nabble.com.


    --------------------------------------------------------------------- To
    unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
    additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Feb 20, 2008 at 4:44 pm
    Hi Mark Harwood
    I know it's being a long time, but till now i was busy in developing the
    database to store the keyword, document and no. of clicks of the document
    for the keyword and their respective mappings.
    now i want my database to communicate with lucene api and i cannot figure it
    out where to start from.
    Please help me out, how can i make my database to work with lucene.
    Thanks
    Sumit

    mark harwood wrote:
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field called
    "searchClicks". The idea is Lucene looks after the usual, static document
    content while your implementation goes off to your more volatile storage
    (e.g. database/parallel index, custom file structure) to retrieve lists of
    doc ids, term frequencies etc. for this "searchClicks" field. All of the
    Lucene queries you might want to throw at this e.g. PhraseQueries can then
    test both the static Lucene fields and your new volatile "click" fields
    without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p15591566.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael Stoppelman at Feb 20, 2008 at 10:35 pm
    To add to what Mark is saying, it's very important that watch out for the
    first N results effect. If you showed a user a random set of documents with
    crap
    relevance I'll bet you that a good number will click on the first result
    (call it user laziness or the Google "I'm feeling lucky" effect :)). You can
    a/b results with
    some entropy or try determine your own result position normalizers.

    You could also have your own doc id that is stable and you mark documents
    maybe a md5 of the title and then have an external boost file that has
    query-to-doc.
    Then on the query you boost result documents accordingly.

    -M
    On Sun, Dec 23, 2007 at 2:15 AM, mark harwood wrote:

    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field called
    "searchClicks". The idea is Lucene looks after the usual, static document
    content while your implementation goes off to your more volatile storage (
    e.g. database/parallel index, custom file structure) to retrieve lists of
    doc ids, term frequencies etc. for this "searchClicks" field. All of the
    Lucene queries you might want to throw at this e.g. PhraseQueries can then
    test both the static Lucene fields and your new volatile "click" fields
    without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Feb 20, 2008 at 5:33 pm
    Hi Sumit,
    now i want my database to communicate with lucene api
    I would recommend that it's the other way round....see my earlier comment on using FilterIndexReader and creating "faked" TermEnum and TermDocs to make your database content appear as if it were part of the index when calling Lucene. If you do want to make the database call Lucene see the recent work on embedding Lucene in Oracle.
    There is no simple ready-made solution here that I can post in a few lines of code - you'll need to familiarise yourself with these low-level APIs that underpin Lucene searches (they are all documented).


    Cheers
    Mark

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Wednesday, 20 February, 2008 4:43:56 PM
    Subject: Re: Which file in the lucene package is used to manipulate results..


    Hi Mark Harwood
    I know it's being a long time, but till now i was busy in developing the
    database to store the keyword, document and no. of clicks of the document
    for the keyword and their respective mappings.
    now i want my database to communicate with lucene api and i cannot figure it
    out where to start from.
    Please help me out, how can i make my database to work with lucene.
    Thanks
    Sumit

    mark harwood wrote:
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field called
    "searchClicks". The idea is Lucene looks after the usual, static document
    content while your implementation goes off to your more volatile storage
    (e.g. database/parallel index, custom file structure) to retrieve lists of
    doc ids, term frequencies etc. for this "searchClicks" field. All of the
    Lucene queries you might want to throw at this e.g. PhraseQueries can then
    test both the static Lucene fields and your new volatile "click" fields
    without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p15591566.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sumittyagi at Feb 20, 2008 at 7:17 pm
    hi Mark
    Actually i am using object oriented database. Where can i find the
    information regarding embedding lucene with database.
    Thanks


    mark harwood wrote:
    Hi Sumit,
    now i want my database to communicate with lucene api
    I would recommend that it's the other way round....see my earlier comment
    on using FilterIndexReader and creating "faked" TermEnum and TermDocs to
    make your database content appear as if it were part of the index when
    calling Lucene. If you do want to make the database call Lucene see the
    recent work on embedding Lucene in Oracle.
    There is no simple ready-made solution here that I can post in a few lines
    of code - you'll need to familiarise yourself with these low-level APIs
    that underpin Lucene searches (they are all documented).


    Cheers
    Mark

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Wednesday, 20 February, 2008 4:43:56 PM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Hi Mark Harwood
    I know it's being a long time, but till now i was busy in developing the
    database to store the keyword, document and no. of clicks of the document
    for the keyword and their respective mappings.
    now i want my database to communicate with lucene api and i cannot figure
    it
    out where to start from.
    Please help me out, how can i make my database to work with lucene.
    Thanks
    Sumit

    mark harwood wrote:
    Thanks for the context - much more useful.
    The challenge here is similar to that posed by offering end-user tagging
    of content (see here
    http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
    The main difference here being that words are added to docs implicitly by
    search click-throughs rather than any explicit tagging action.

    In both cases the challenge is that the user data around documents is
    likely to be updated very often while the documents remain relatively
    static.
    I suspect some additional things to think about are:
    1) Cancelling out the "human laziness" bias that favours clicking results
    on page 1. Are clicks on page 2 worth more?
    2) Spam clicks - detecting deliberate gaming of your re-ranking
    algorithm.
    3) Lucene doc IDs are not stable - how will you associate query
    terms/click data with documents and join them at speed?
    4) Are individual words or phrases the unit of boost? "Paris" means
    different things in "Paris Hilton" and "Paris, France".

    A simple approach might be to re-index your content with all of the
    additional search terms from clicks added to the associated document in a
    "searchClicks" field - the more clicks, the more repetitions of the same
    search words in the document to help with tf (Term Frequency). This
    additional content would need to be capped, to avoid huge documents. This
    has the disadvantage of requiring a re-index though.
    Another option to avoid reindexing everything is to wrap IndexReader (See
    FilterIndexReader) and implement TermEnum/TermDocs for a fake field
    called
    "searchClicks". The idea is Lucene looks after the usual, static document
    content while your implementation goes off to your more volatile storage
    (e.g. database/parallel index, custom file structure) to retrieve lists
    of
    doc ids, term frequencies etc. for this "searchClicks" field. All of the
    Lucene queries you might want to throw at this e.g. PhraseQueries can
    then
    test both the static Lucene fields and your new volatile "click" fields
    without being aware of this low-level trickery.

    I'm sure there will be other ways of doing this too but this seems like a
    conceptually clean way of modelling it - just seeing search terms as
    extensions to the document content.

    Cheers
    Mark


    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Sunday, 23 December, 2007 5:30:55 AM
    Subject: Re: Which file in the lucene package is used to manipulate
    results..


    Actually what i have to do is...
    1.) for every query(keyword), among the results obtained, the keyword
    will
    be mapped with the page clicked, along with the no. of clicks for that
    keyword on that page
    2.) next time for the same query(keyword), the mapped pages will be
    ranked
    higher considering the no. of clicks too..
    3.) for every new query these steps will be repeated...
    this was a very high level view , i have made algorithms for these
    modules
    and trying to incorporate with lucene but dont know , on which files i
    have
    to do edition to make it work...
    please help me regarding this, if you need some more explanation,
    please let
    me know...
    thanks
    Sumit Tyagi





    Erick Erickson wrote:
    You still haven't explained *why* you want to rerank results. What
    is the use-case you're trying to implement? Quite often it's turned
    out for me that when I let folks on the list know what the use
    case I'm trying to support is, they come up with much more elegant
    solutions than I was thinking about.

    For instance, does the CustomScoreQuery class have any relevance
    to your problem?

    If you're thinking of modifying the core Lucene code for your
    special purpose, I'd advise against it unless and until you'd exhausted
    all the other options. It's always a maintenance headache to do this.

    Best
    Erick
    On Dec 21, 2007 10:09 AM, sumittyagi wrote:


    actually i am writing a module to rerank the results, so i want to
    edit
    the
    file which arrange the results and give them ranks,
    or is there any other way i can use my module to rerank the results


    markharw00d wrote:
    I think you need to describe your "factors" in more detail.
    Exactly
    what
    do you want to achieve for your users?
    We could be talking about any number of Lucene functions here.

    ----- Original Message ----
    From: sumittyagi <ping.sumit@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 21 December, 2007 4:51:09 AM
    Subject: Which file in the lucene package is used to manipulate results..

    hi, i am using lucene for the very first time and want to
    manipulate
    the
    results, by adding some more factors to it, which file should i
    edit to
    manipulate the search results....

    Thanks
    Sumit Tyagi
    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.




    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:

    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p15591566.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org






    __________________________________________________________
    Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p15596170.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Markharw00d at Feb 21, 2008 at 12:47 am

    Where can i find the information regarding embedding lucene with database.
    Thanks
    http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html

    http://issues.apache.org/jira/browse/LUCENE-434

    Cheers
    Mark


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 21, '07 at 4:51a
activeFeb 21, '08 at 12:47a
posts18
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase