FAQ
Hi,

I am fairly new to Lucene and have encounter a problem with the search
function i am trying to create using Lucene. When I search, lets say "news
sharing", then the results return and display.

Its fine up to this point until I check the ranking. Some results, although
match only 1 of the 2 keywords, will have higher ranking. The problem is
like describe below:

Page 1
news - Total found 23
sharing - Total found 0

Page 2
news - Total found 1
sharing - Total found 21

This is understandable why Page 1 got better ranking, bcs it has more
keyword found. But this will make the results return to be less relevant

My current query is like the following:
(url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0
content:news title:news^1.5) url:"sharing news"~2147483647^2.0
content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5

Is there anyway I can add an additional query that will give an additional
boost to results that has both the keyword in it?
--
View this message in context: http://www.nabble.com/Query-Boosting-tp24913967p24913967.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Simon Willnauer at Aug 11, 2009 at 9:53 am
    Hi there,

    well, where to start from.... I would suggest you look at the output
    of Query#explain() first to see how the score is calculated. You might
    use a simpler query to get started with it as this might be quite
    cryptic if you see it the first time.
    To completely understand what the output means have a closer look to
    the javadoc of the class Similarity
    (http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html)
    this will explain how the score is calculated in the very detail.
    Once you understand what is going on during the scoring process I
    would suggest you revise your boosting. I don't know if you have field
    boost set but it seems it would make more sense in your usecase as far
    as I can tell.
    In general make sure you understand what the different boosts are used
    for - this snippet from the wiki might help you:
    <snip>
    What is the difference between field (or document) boosting and query boosting?

    Index time field boosts (field.setBoost(boost)) are a way to express
    things like "this document's title is worth twice as much as the title
    of most documents". Query time boosts (query.setBoost(boost)) are a
    way to express "I care about matches on this clause of my query twice
    as much as I do about matches on other clauses of my query".

    Index time field boosts are worthless if you set them on every document.

    Index time document boosts (doc.setBoost(float)) are equivalent to
    setting a field boost on ever field in that document.
    </snip> (http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7)

    hope that helps to get started with scoring etc.

    simon


    On Tue, Aug 11, 2009 at 10:50 AM, bourne71wrote:
    Hi,

    I am fairly new to Lucene and have encounter a problem with the search
    function i am trying to create using Lucene.  When I search, lets say "news
    sharing", then the results return and display.

    Its fine up to this point until I check the ranking. Some results, although
    match only 1 of the 2 keywords, will have higher ranking. The problem is
    like describe below:

    Page 1
    news - Total found 23
    sharing - Total found 0

    Page 2
    news - Total found 1
    sharing - Total found 21

    This is understandable why Page 1 got better ranking, bcs it has more
    keyword found. But this will make the results return to be less relevant

    My current query is like the following:
    (url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0
    content:news title:news^1.5) url:"sharing news"~2147483647^2.0
    content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5

    Is there anyway I can add an additional query that will give an additional
    boost to results that has both the keyword in it?
    --
    View this message in context: http://www.nabble.com/Query-Boosting-tp24913967p24913967.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bourne71 at Aug 12, 2009 at 1:56 am
    thanks, I understand how boosting works, what I need will be a boost in the
    query that will increase the score of a page if all keywords/query is found
    in the page to increase its ranking.

    I tried all sort of combination and it did not work. Anyone can provide any
    suggestion?


    Simon Willnauer wrote:
    Hi there,

    well, where to start from.... I would suggest you look at the output
    of Query#explain() first to see how the score is calculated. You might
    use a simpler query to get started with it as this might be quite
    cryptic if you see it the first time.
    To completely understand what the output means have a closer look to
    the javadoc of the class Similarity
    (http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html)
    this will explain how the score is calculated in the very detail.
    Once you understand what is going on during the scoring process I
    would suggest you revise your boosting. I don't know if you have field
    boost set but it seems it would make more sense in your usecase as far
    as I can tell.
    In general make sure you understand what the different boosts are used
    for - this snippet from the wiki might help you:
    <snip>
    What is the difference between field (or document) boosting and query
    boosting?

    Index time field boosts (field.setBoost(boost)) are a way to express
    things like "this document's title is worth twice as much as the title
    of most documents". Query time boosts (query.setBoost(boost)) are a
    way to express "I care about matches on this clause of my query twice
    as much as I do about matches on other clauses of my query".

    Index time field boosts are worthless if you set them on every document.

    Index time document boosts (doc.setBoost(float)) are equivalent to
    setting a field boost on ever field in that document.
    </snip>
    (http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7)

    hope that helps to get started with scoring etc.

    simon


    On Tue, Aug 11, 2009 at 10:50 AM, bourne71wrote:
    Hi,

    I am fairly new to Lucene and have encounter a problem with the search
    function i am trying to create using Lucene.  When I search, lets say
    "news
    sharing", then the results return and display.

    Its fine up to this point until I check the ranking. Some results,
    although
    match only 1 of the 2 keywords, will have higher ranking. The problem is
    like describe below:

    Page 1
    news - Total found 23
    sharing - Total found 0

    Page 2
    news - Total found 1
    sharing - Total found 21

    This is understandable why Page 1 got better ranking, bcs it has more
    keyword found. But this will make the results return to be less relevant

    My current query is like the following:
    (url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0
    content:news title:news^1.5) url:"sharing news"~2147483647^2.0
    content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5

    Is there anyway I can add an additional query that will give an
    additional
    boost to results that has both the keyword in it?
    --
    View this message in context:
    http://www.nabble.com/Query-Boosting-tp24913967p24913967.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Query-Boosting-tp24913967p24928789.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • AHMET ARSLAN at Aug 12, 2009 at 9:09 am

    thanks, I understand how boosting works, what I need will
    be a boost in the query that will increase the score of a page if all
    keywords/query is found in the page to increase its ranking.
    You can find answer of your question in the last two messages at this thread:

    http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-td24694748.html




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 11, '09 at 8:51a
activeAug 12, '09 at 9:09a
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase