FAQ
Hi all,

I am using Hibernate Search (http://www.hibernate.org/410.html) which is a wrapper around Lucene for performing search over info stored in the DB. I have questions related to Lucene boosting Vs sorting:

Is index time boosting of documents and fields better than specifying sorting parameters at search time?

I have been browsing through the Lucene mail archives for an answer to this. Going through them and reading on stuff related to Lucene scoring, my understanding is that if I know upfront at index time that the relevance order of results is based on certain fields, then, it is better to have index time boosting of documents and fields. Am I right here?

My requirements are like:
Results having an exact match to the input query string should have highest preference followed by an exact match with field1, field2, field3 and then followed by search query substring (or near match) match with field1, field2, field3.

Any suggestions are most welcome.

--Rakesh S

_________________________________________________________________
Post free property ads on Yello Classifieds now! www.yello.in
http://ss1.richmedia.in/recurl.asp?pid=219

Search Discussions

  • Erick Erickson at Dec 21, 2007 at 2:46 pm
    From my perspective, index-time boosting and sorting are apples
    and oranges.

    According to a post from Hoss, index-time boosting is a way of
    saying that "Field x in this document is more important than
    field x in other documents". Query-time boosts are a way of
    saying "I care about field X more than field Y across *all*
    documents".

    So index time boosting doesn't seem to relate to your problem since
    you really want to compare field x across all documents. It seems
    that query-time boosting is more relevant.

    So, leaving aside how you form your "similar" q
    On Dec 20, 2007 10:50 PM, Rakesh Shete wrote:


    Hi all,

    I am using Hibernate Search (http://www.hibernate.org/410.html) which is a
    wrapper around Lucene for performing search over info stored in the DB. I
    have questions related to Lucene boosting Vs sorting:

    Is index time boosting of documents and fields better than specifying
    sorting parameters at search time?

    I have been browsing through the Lucene mail archives for an answer to
    this. Going through them and reading on stuff related to Lucene scoring, my
    understanding is that if I know upfront at index time that the relevance
    order of results is based on certain fields, then, it is better to have
    index time boosting of documents and fields. Am I right here?

    My requirements are like:
    Results having an exact match to the input query string should have
    highest preference followed by an exact match with field1, field2, field3
    and then followed by search query substring (or near match) match with
    field1, field2, field3.

    Any suggestions are most welcome.

    --Rakesh S

    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
  • Erick Erickson at Dec 21, 2007 at 2:53 pm
    OK, I'm trying to adjust to a Mac and my keyboard shortcuts sometimes
    lead me to send the mail when I didn't intend. Sorry about that...

    So, leaving aside how you form your "similar" query, I *think* you
    want to form two clauses, your "exact" and your "similar" and
    boost them individually, combined in a boolean query.

    This will still interleave the results I think. But it's also a valid
    question whether this is good or bad. Is it *really* better for your
    users to see a low-relevance query that happens to have the exact
    words in it before a very-high ranking but not quite exact response?
    That, of course it up to your product manager....

    If it is really a requirement, it seems to me that you would be able to
    just form the two queries independently, then just post-process them.
    One query is the exact version, and the second query is the similar one.
    Then just combine the results as you please by iterating the hits
    object for the exact query then following it by the same for the similar.

    I don't see how sorting relates to your problem at all....

    Best
    Erick
    On Dec 21, 2007 9:46 AM, Erick Erickson wrote:

    From my perspective, index-time boosting and sorting are apples
    and oranges.

    According to a post from Hoss, index-time boosting is a way of
    saying that "Field x in this document is more important than
    field x in other documents". Query-time boosts are a way of
    saying "I care about field X more than field Y across *all*
    documents".

    So index time boosting doesn't seem to relate to your problem since
    you really want to compare field x across all documents. It seems
    that query-time boosting is more relevant.

    So, leaving aside how you form your "similar" q

    On Dec 20, 2007 10:50 PM, Rakesh Shete wrote:


    Hi all,

    I am using Hibernate Search (http://www.hibernate.org/410.html) which is
    a wrapper around Lucene for performing search over info stored in the DB. I
    have questions related to Lucene boosting Vs sorting:

    Is index time boosting of documents and fields better than specifying
    sorting parameters at search time?

    I have been browsing through the Lucene mail archives for an answer to
    this. Going through them and reading on stuff related to Lucene scoring, my
    understanding is that if I know upfront at index time that the relevance
    order of results is based on certain fields, then, it is better to have
    index time boosting of documents and fields. Am I right here?

    My requirements are like:
    Results having an exact match to the input query string should have
    highest preference followed by an exact match with field1, field2, field3
    and then followed by search query substring (or near match) match with
    field1, field2, field3.

    Any suggestions are most welcome.

    --Rakesh S

    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
  • Rakesh Shete at Dec 21, 2007 at 5:51 pm
    Hi Eric,
    I don't see how sorting relates to your problem at all....
    Could you just explain how is sorting different from boosting?

    I have been trying to figure this out. Going through "Lucene In Action" my
    understanding of sorting is that it will kind of second level of ordering
    after the query results have been scored (Not sure if the relevance established
    by scoring is lost in this process).

    Is it *really* better for your users to see a low-relevance query
    that happens to have the exact words in it before a very-high
    ranking but not quite exact response?
    Nopes. Thats the last thing my product manager will want.

    Lets take an example to simplify this:

    I have fields like title, description, tags. Now when I search for a term
    "Indoor Photography" then I would like the results with exact match in title to be
    more important than in description or tags. However, if there is an exact match in description
    then it should be given more preference than the partial match in title.

    Going by the points mentioned below and as per one of your posts
    (http://mail-archives.apache.org/mod_mbox/lucene-java-user/200609.mbox/%3CPine.LNX.4.58.0609271134380.32280@hal.rescomp.berkeley.edu%3E)
    I understand that I need to specify query time boosting like this:

    title:Indoor Photography^2.5 description:Indoor Photography^1.5 tags:
    Indoor Photography^1.2

    Let me know if this would help my cause.

    Thnx for ur time n the valuable info.

    --Rakesh S




    Date: Fri, 21 Dec 2007 09:53:02 -0500
    From: erickerickson@gmail.com
    To: java-user@lucene.apache.org
    Subject: Re: Boosting Vs Sorting

    OK, I'm trying to adjust to a Mac and my keyboard shortcuts sometimes
    lead me to send the mail when I didn't intend. Sorry about that...

    So, leaving aside how you form your "similar" query, I *think* you
    want to form two clauses, your "exact" and your "similar" and
    boost them individually, combined in a boolean query.

    This will still interleave the results I think. But it's also a valid
    question whether this is good or bad. Is it *really* better for your
    users to see a low-relevance query that happens to have the exact
    words in it before a very-high ranking but not quite exact response?
    That, of course it up to your product manager....

    If it is really a requirement, it seems to me that you would be able to
    just form the two queries independently, then just post-process them.
    One query is the exact version, and the second query is the similar one.
    Then just combine the results as you please by iterating the hits
    object for the exact query then following it by the same for the similar.

    I don't see how sorting relates to your problem at all....

    Best
    Erick
    On Dec 21, 2007 9:46 AM, Erick Erickson wrote:

    From my perspective, index-time boosting and sorting are apples
    and oranges.

    According to a post from Hoss, index-time boosting is a way of
    saying that "Field x in this document is more important than
    field x in other documents". Query-time boosts are a way of
    saying "I care about field X more than field Y across *all*
    documents".

    So index time boosting doesn't seem to relate to your problem since
    you really want to compare field x across all documents. It seems
    that query-time boosting is more relevant.

    So, leaving aside how you form your "similar" q

    On Dec 20, 2007 10:50 PM, Rakesh Shete wrote:


    Hi all,

    I am using Hibernate Search (http://www.hibernate.org/410.html) which is
    a wrapper around Lucene for performing search over info stored in the DB. I
    have questions related to Lucene boosting Vs sorting:

    Is index time boosting of documents and fields better than specifying
    sorting parameters at search time?

    I have been browsing through the Lucene mail archives for an answer to
    this. Going through them and reading on stuff related to Lucene scoring, my
    understanding is that if I know upfront at index time that the relevance
    order of results is based on certain fields, then, it is better to have
    index time boosting of documents and fields. Am I right here?

    My requirements are like:
    Results having an exact match to the input query string should have
    highest preference followed by an exact match with field1, field2, field3
    and then followed by search query substring (or near match) match with
    field1, field2, field3.

    Any suggestions are most welcome.

    --Rakesh S

    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
  • Erick Erickson at Dec 21, 2007 at 10:16 pm
    See below...
    On Dec 21, 2007 12:50 PM, Rakesh Shete wrote:


    Hi Eric,
    I don't see how sorting relates to your problem at all....
    Could you just explain how is sorting different from boosting?

    I have been trying to figure this out. Going through "Lucene In Action" my
    understanding of sorting is that it will kind of second level of ordering
    after the query results have been scored (Not sure if the relevance
    established
    by scoring is lost in this process).
    I often think of sorting as being orthogonal to boosting. They're really
    unrelated.
    Boosting changes how the scoring of documents work. Sorting ignores scoring
    and arranges the results lexically. You can *only* sort on fields that are a
    single
    token. I'm cheating a little here and you can implement your own
    sorts, but that's another story.

    Maybe this would help. Say you were indexing books and wanted the results
    presented to the user by title. You could index a "titlesort" field that had
    the
    title lowercased and all spaces replaced with underscores. Then, you could
    sort the result of all books containing "solar energy" by title. Where's the
    score
    here? The only relevance score has here is that no book in the result set
    will
    have a score of 0.

    I did, at one point, have to sort by score then sub-sort by title. That is,
    present the user with the top scoring documents sub-sorted by title.
    This involved using relevancy as the primary sort and sub-sorting by
    title. But the problem here is that scores of 0.98374 wouldn't be in the
    same bucket as a score of 0.98375. Search the mail archive for
    "bucket" and you should see that discussion.


    Is it *really* better for your users to see a low-relevance query
    that happens to have the exact words in it before a very-high
    ranking but not quite exact response?
    Nopes. Thats the last thing my product manager will want.

    Lets take an example to simplify this:

    I have fields like title, description, tags. Now when I search for a term
    "Indoor Photography" then I would like the results with exact match in
    title to be
    more important than in description or tags. However, if there is an exact
    match in description
    then it should be given more preference than the partial match in title.

    Going by the points mentioned below and as per one of your posts
    (
    http://mail-archives.apache.org/mod_mbox/lucene-java-user/200609.mbox/%3CPine.LNX.4.58.0609271134380.32280@hal.rescomp.berkeley.edu%3E
    )
    I understand that I need to specify query time boosting like this:

    title:Indoor Photography^2.5 description:Indoor Photography^1.5 tags:
    Indoor Photography^1.2
    That would go some distance towards what you want, but watch the syntax.
    You might be better off constructing your own BooleanQuery. The syntax above
    would actually parse something like title:Indoor default_field:Photography^
    2.5. You
    need parentheses. Also think about phrase queries....

    Hope this helps
    Erick

    Let me know if this would help my cause.

    Thnx for ur time n the valuable info.

    --Rakesh S




    Date: Fri, 21 Dec 2007 09:53:02 -0500
    From: erickerickson@gmail.com
    To: java-user@lucene.apache.org
    Subject: Re: Boosting Vs Sorting

    OK, I'm trying to adjust to a Mac and my keyboard shortcuts sometimes
    lead me to send the mail when I didn't intend. Sorry about that...

    So, leaving aside how you form your "similar" query, I *think* you
    want to form two clauses, your "exact" and your "similar" and
    boost them individually, combined in a boolean query.

    This will still interleave the results I think. But it's also a valid
    question whether this is good or bad. Is it *really* better for your
    users to see a low-relevance query that happens to have the exact
    words in it before a very-high ranking but not quite exact response?
    That, of course it up to your product manager....

    If it is really a requirement, it seems to me that you would be able to
    just form the two queries independently, then just post-process them.
    One query is the exact version, and the second query is the similar one.
    Then just combine the results as you please by iterating the hits
    object for the exact query then following it by the same for the similar.
    I don't see how sorting relates to your problem at all....

    Best
    Erick
    On Dec 21, 2007 9:46 AM, Erick Erickson wrote:

    From my perspective, index-time boosting and sorting are apples
    and oranges.

    According to a post from Hoss, index-time boosting is a way of
    saying that "Field x in this document is more important than
    field x in other documents". Query-time boosts are a way of
    saying "I care about field X more than field Y across *all*
    documents".

    So index time boosting doesn't seem to relate to your problem since
    you really want to compare field x across all documents. It seems
    that query-time boosting is more relevant.

    So, leaving aside how you form your "similar" q

    On Dec 20, 2007 10:50 PM, Rakesh Shete wrote:


    Hi all,

    I am using Hibernate Search (http://www.hibernate.org/410.html)
    which is
    a wrapper around Lucene for performing search over info stored in
    the DB. I
    have questions related to Lucene boosting Vs sorting:

    Is index time boosting of documents and fields better than
    specifying
    sorting parameters at search time?

    I have been browsing through the Lucene mail archives for an answer
    to
    this. Going through them and reading on stuff related to Lucene
    scoring, my
    understanding is that if I know upfront at index time that the
    relevance
    order of results is based on certain fields, then, it is better to
    have
    index time boosting of documents and fields. Am I right here?

    My requirements are like:
    Results having an exact match to the input query string should have
    highest preference followed by an exact match with field1, field2,
    field3
    and then followed by search query substring (or near match) match
    with
    field1, field2, field3.

    Any suggestions are most welcome.

    --Rakesh S

    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
  • Rakesh Shete at Dec 24, 2007 at 1:29 pm
    Hi Eric,

    Thanks for the help. Most of it makes sense to me and I amgoing ahead with query boosting during search instead of index time boosting.

    Could just tell me how do we arrive at the correct boost values for a query with multiple fields? My understanding is that this will vary as per the data that is being searched.

    My query after boosting would look something like this:

    +(i_title:sorting*^6.0 i_description:sorting*^3.5 i_detailedInfo:sorting*^1.5 i_tags:sorting*^1.2) -i_published:false +i_topicsClasses.id:1*

    The reason for using high values is that I want to make sure that multiple occurrences of the search string in say field 'i_description' should not be treated as more relevant than a single match in the 'i_title' field.

    Do you have some guideline using which I can arrive at the correct boost value in the above scenario?

    --Rakesh S
    Date: Fri, 21 Dec 2007 17:15:29 -0500
    From: erickerickson@gmail.com
    To: java-user@lucene.apache.org
    Subject: Re: Boosting Vs Sorting

    See below...
    On Dec 21, 2007 12:50 PM, Rakesh Shete wrote:


    Hi Eric,
    I don't see how sorting relates to your problem at all....
    Could you just explain how is sorting different from boosting?

    I have been trying to figure this out. Going through "Lucene In Action" my
    understanding of sorting is that it will kind of second level of ordering
    after the query results have been scored (Not sure if the relevance
    established
    by scoring is lost in this process).
    I often think of sorting as being orthogonal to boosting. They're really
    unrelated.
    Boosting changes how the scoring of documents work. Sorting ignores scoring
    and arranges the results lexically. You can *only* sort on fields that are a
    single
    token. I'm cheating a little here and you can implement your own
    sorts, but that's another story.

    Maybe this would help. Say you were indexing books and wanted the results
    presented to the user by title. You could index a "titlesort" field that had
    the
    title lowercased and all spaces replaced with underscores. Then, you could
    sort the result of all books containing "solar energy" by title. Where's the
    score
    here? The only relevance score has here is that no book in the result set
    will
    have a score of 0.

    I did, at one point, have to sort by score then sub-sort by title. That is,
    present the user with the top scoring documents sub-sorted by title.
    This involved using relevancy as the primary sort and sub-sorting by
    title. But the problem here is that scores of 0.98374 wouldn't be in the
    same bucket as a score of 0.98375. Search the mail archive for
    "bucket" and you should see that discussion.


    Is it *really* better for your users to see a low-relevance query
    that happens to have the exact words in it before a very-high
    ranking but not quite exact response?
    Nopes. Thats the last thing my product manager will want.

    Lets take an example to simplify this:

    I have fields like title, description, tags. Now when I search for a term
    "Indoor Photography" then I would like the results with exact match in
    title to be
    more important than in description or tags. However, if there is an exact
    match in description
    then it should be given more preference than the partial match in title.

    Going by the points mentioned below and as per one of your posts
    (
    http://mail-archives.apache.org/mod_mbox/lucene-java-user/200609.mbox/%3CPine.LNX.4.58.0609271134380.32280@hal.rescomp.berkeley.edu%3E
    )
    I understand that I need to specify query time boosting like this:

    title:Indoor Photography^2.5 description:Indoor Photography^1.5 tags:
    Indoor Photography^1.2
    That would go some distance towards what you want, but watch the syntax.
    You might be better off constructing your own BooleanQuery. The syntax above
    would actually parse something like title:Indoor default_field:Photography^
    2.5. You
    need parentheses. Also think about phrase queries....

    Hope this helps
    Erick

    Let me know if this would help my cause.

    Thnx for ur time n the valuable info.

    --Rakesh S




    Date: Fri, 21 Dec 2007 09:53:02 -0500
    From: erickerickson@gmail.com
    To: java-user@lucene.apache.org
    Subject: Re: Boosting Vs Sorting

    OK, I'm trying to adjust to a Mac and my keyboard shortcuts sometimes
    lead me to send the mail when I didn't intend. Sorry about that...

    So, leaving aside how you form your "similar" query, I *think* you
    want to form two clauses, your "exact" and your "similar" and
    boost them individually, combined in a boolean query.

    This will still interleave the results I think. But it's also a valid
    question whether this is good or bad. Is it *really* better for your
    users to see a low-relevance query that happens to have the exact
    words in it before a very-high ranking but not quite exact response?
    That, of course it up to your product manager....

    If it is really a requirement, it seems to me that you would be able to
    just form the two queries independently, then just post-process them.
    One query is the exact version, and the second query is the similar one.
    Then just combine the results as you please by iterating the hits
    object for the exact query then following it by the same for the similar.
    I don't see how sorting relates to your problem at all....

    Best
    Erick
    On Dec 21, 2007 9:46 AM, Erick Erickson wrote:

    From my perspective, index-time boosting and sorting are apples
    and oranges.

    According to a post from Hoss, index-time boosting is a way of
    saying that "Field x in this document is more important than
    field x in other documents". Query-time boosts are a way of
    saying "I care about field X more than field Y across *all*
    documents".

    So index time boosting doesn't seem to relate to your problem since
    you really want to compare field x across all documents. It seems
    that query-time boosting is more relevant.

    So, leaving aside how you form your "similar" q


    On Dec 20, 2007 10:50 PM, Rakesh Shete < rakesh_shete@hotmail.com>
    wrote:
    Hi all,

    I am using Hibernate Search (http://www.hibernate.org/410.html)
    which is
    a wrapper around Lucene for performing search over info stored in
    the DB. I
    have questions related to Lucene boosting Vs sorting:

    Is index time boosting of documents and fields better than
    specifying
    sorting parameters at search time?

    I have been browsing through the Lucene mail archives for an answer
    to
    this. Going through them and reading on stuff related to Lucene
    scoring, my
    understanding is that if I know upfront at index time that the
    relevance
    order of results is based on certain fields, then, it is better to
    have
    index time boosting of documents and fields. Am I right here?

    My requirements are like:
    Results having an exact match to the input query string should have
    highest preference followed by an exact match with field1, field2,
    field3
    and then followed by search query substring (or near match) match
    with
    field1, field2, field3.

    Any suggestions are most welcome.

    --Rakesh S

    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
  • Erick Erickson at Dec 24, 2007 at 3:10 pm
    I don't think there's any real formula for the "correct" boosts, it's more a
    matter of
    experimentation.

    I'd look at some of the raw scores that come back (not the normalized ones)
    and
    go from there. See HitCollector (?).

    Best
    Erick
    On Dec 24, 2007 8:29 AM, Rakesh Shete wrote:


    Hi Eric,

    Thanks for the help. Most of it makes sense to me and I amgoing ahead with
    query boosting during search instead of index time boosting.

    Could just tell me how do we arrive at the correct boost values for a
    query with multiple fields? My understanding is that this will vary as per
    the data that is being searched.

    My query after boosting would look something like this:

    +(i_title:sorting*^6.0 i_description:sorting*^3.5 i_detailedInfo:sorting*^
    1.5 i_tags:sorting*^1.2) -i_published:false +i_topicsClasses.id:1*

    The reason for using high values is that I want to make sure that multiple
    occurrences of the search string in say field 'i_description' should not be
    treated as more relevant than a single match in the 'i_title' field.

    Do you have some guideline using which I can arrive at the correct boost
    value in the above scenario?

    --Rakesh S
    Date: Fri, 21 Dec 2007 17:15:29 -0500
    From: erickerickson@gmail.com
    To: java-user@lucene.apache.org
    Subject: Re: Boosting Vs Sorting

    See below...
    On Dec 21, 2007 12:50 PM, Rakesh Shete wrote:


    Hi Eric,
    I don't see how sorting relates to your problem at all....
    Could you just explain how is sorting different from boosting?

    I have been trying to figure this out. Going through "Lucene In
    Action" my
    understanding of sorting is that it will kind of second level of
    ordering
    after the query results have been scored (Not sure if the relevance
    established
    by scoring is lost in this process).
    I often think of sorting as being orthogonal to boosting. They're really
    unrelated.
    Boosting changes how the scoring of documents work. Sorting ignores scoring
    and arranges the results lexically. You can *only* sort on fields that are a
    single
    token. I'm cheating a little here and you can implement your own
    sorts, but that's another story.

    Maybe this would help. Say you were indexing books and wanted the results
    presented to the user by title. You could index a "titlesort" field that had
    the
    title lowercased and all spaces replaced with underscores. Then, you could
    sort the result of all books containing "solar energy" by title. Where's the
    score
    here? The only relevance score has here is that no book in the result set
    will
    have a score of 0.

    I did, at one point, have to sort by score then sub-sort by title. That is,
    present the user with the top scoring documents sub-sorted by title.
    This involved using relevancy as the primary sort and sub-sorting by
    title. But the problem here is that scores of 0.98374 wouldn't be in the
    same bucket as a score of 0.98375. Search the mail archive for
    "bucket" and you should see that discussion.


    Is it *really* better for your users to see a low-relevance query
    that happens to have the exact words in it before a very-high
    ranking but not quite exact response?
    Nopes. Thats the last thing my product manager will want.

    Lets take an example to simplify this:

    I have fields like title, description, tags. Now when I search for a
    term
    "Indoor Photography" then I would like the results with exact match in
    title to be
    more important than in description or tags. However, if there is an
    exact
    match in description
    then it should be given more preference than the partial match in
    title.
    Going by the points mentioned below and as per one of your posts
    (
    http://mail-archives.apache.org/mod_mbox/lucene-java-user/200609.mbox/%3CPine.LNX.4.58.0609271134380.32280@hal.rescomp.berkeley.edu%3E
    )
    I understand that I need to specify query time boosting like this:

    title:Indoor Photography^2.5 description:Indoor Photography^1.5 tags:
    Indoor Photography^1.2
    That would go some distance towards what you want, but watch the syntax.
    You might be better off constructing your own BooleanQuery. The syntax above
    would actually parse something like title:Indoor
    default_field:Photography^
    2.5. You
    need parentheses. Also think about phrase queries....

    Hope this helps
    Erick

    Let me know if this would help my cause.

    Thnx for ur time n the valuable info.

    --Rakesh S




    Date: Fri, 21 Dec 2007 09:53:02 -0500
    From: erickerickson@gmail.com
    To: java-user@lucene.apache.org
    Subject: Re: Boosting Vs Sorting

    OK, I'm trying to adjust to a Mac and my keyboard shortcuts
    sometimes
    lead me to send the mail when I didn't intend. Sorry about that...

    So, leaving aside how you form your "similar" query, I *think* you
    want to form two clauses, your "exact" and your "similar" and
    boost them individually, combined in a boolean query.

    This will still interleave the results I think. But it's also a
    valid
    question whether this is good or bad. Is it *really* better for your
    users to see a low-relevance query that happens to have the exact
    words in it before a very-high ranking but not quite exact response?
    That, of course it up to your product manager....

    If it is really a requirement, it seems to me that you would be able
    to
    just form the two queries independently, then just post-process
    them.
    One query is the exact version, and the second query is the similar
    one.
    Then just combine the results as you please by iterating the hits
    object for the exact query then following it by the same for the similar.
    I don't see how sorting relates to your problem at all....

    Best
    Erick
    On Dec 21, 2007 9:46 AM, Erick Erickson wrote:

    From my perspective, index-time boosting and sorting are apples
    and oranges.

    According to a post from Hoss, index-time boosting is a way of
    saying that "Field x in this document is more important than
    field x in other documents". Query-time boosts are a way of
    saying "I care about field X more than field Y across *all*
    documents".

    So index time boosting doesn't seem to relate to your problem
    since
    you really want to compare field x across all documents. It seems
    that query-time boosting is more relevant.

    So, leaving aside how you form your "similar" q


    On Dec 20, 2007 10:50 PM, Rakesh Shete < rakesh_shete@hotmail.com>
    wrote:
    Hi all,

    I am using Hibernate Search (http://www.hibernate.org/410.html)
    which is
    a wrapper around Lucene for performing search over info stored
    in
    the DB. I
    have questions related to Lucene boosting Vs sorting:

    Is index time boosting of documents and fields better than
    specifying
    sorting parameters at search time?

    I have been browsing through the Lucene mail archives for an
    answer
    to
    this. Going through them and reading on stuff related to Lucene
    scoring, my
    understanding is that if I know upfront at index time that the
    relevance
    order of results is based on certain fields, then, it is better
    to
    have
    index time boosting of documents and fields. Am I right here?

    My requirements are like:
    Results having an exact match to the input query string should
    have
    highest preference followed by an exact match with field1,
    field2,
    field3
    and then followed by search query substring (or near match)
    match
    with
    field1, field2, field3.

    Any suggestions are most welcome.

    --Rakesh S
    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219
    _________________________________________________________________
    Post free property ads on Yello Classifieds now! www.yello.in
    http://ss1.richmedia.in/recurl.asp?pid=219

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 21, '07 at 3:51a
activeDec 24, '07 at 3:10p
posts7
users2
websitelucene.apache.org

2 users in discussion

Erick Erickson: 4 posts Rakesh Shete: 3 posts

People

Translate

site design / logo © 2022 Grokbase