FAQ
I want to only support boolean or query(as many search engine do). But
I want to boost document whose terms are closer.
e.g. the query terms are 'apache lucene'
doc1 apache has many projects such as lucene
doc2 The Apache HTTP Server Project is an effort to develop and
maintain an ... Lucene is a .....
I want to give doc1 more boost.
So I need a place where I can get all position information
How to extend Similarity to achieve this?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Rebecca Watson at Jun 2, 2010 at 6:25 am
    Hi Li Li

    If you want to support some query types and not others you should
    overide/extend the queryparser so that you throw an exception / makes
    a different query type instead.

    Similarity doesn't do the actual scoring, it's used by the Query
    classes (actually the Scorer implementation used by the Query)
    to get tf / idf scores. So scoring is done by the Query types themselves
    (the Weight/Scorer classes used by the Query). So sometimes
    you change/extend Similarity to affect scoring and sometimes you
    need to change/extend the Query/Scorer/Weight classes etc.
    Luckily though, there are already some Query types that boost
    scores if the terms are closer -- the SpanQuery query types do this.

    You will either have to write your own queryparser (to convert query string
    to the Query object(s) required,
    or you could use e.g. the Surround queryparser supports specifying
    SpanQuery queries via user-entered text:
    http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/

    look in the lucene contrib/surround package for the surround queryparser.

    I've messed about with using SpanQuery classes a bit and people report
    that they are slower - which is true because they use a lot of extra
    info to perform their matching/scoring -- but they are still easily
    sub-second over most
    queries / reasonable sized doc collection
    -- and they only get a lot slower if you have terms that occur very frequently
    in the document collection you are searching.
    So if you are removing stop words etc you shouldn't have too
    many problems efficiency wise.

    hope that helps,

    bec :)
    On 1 June 2010 22:07, Li Li wrote:
    I want to only support boolean or query(as many search engine do). But
    I want to boost document whose terms are closer.
    e.g. the query terms are 'apache lucene'
    doc1 apache has many projects such as lucene
    doc2 The Apache HTTP Server Project is an effort to develop and
    maintain an ...  Lucene is a .....
    I want to give doc1 more boost.
    So I need a place where I can get all position information
    How to extend Similarity to achieve this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Li Li at Jun 2, 2010 at 7:11 am
    thank you

    2010/6/2 Rebecca Watson <bec.watson@gmail.com>:
    Hi Li Li

    If you want to support some query types and not others you should
    overide/extend the queryparser so that you throw an exception / makes
    a different query type instead.

    Similarity doesn't do the actual scoring, it's used by the Query
    classes (actually the Scorer implementation used by the Query)
    to get tf / idf scores. So scoring is done by the Query types themselves
    (the Weight/Scorer classes used by the Query). So sometimes
    you change/extend Similarity to affect scoring and sometimes you
    need to change/extend the Query/Scorer/Weight classes etc.
    Luckily though, there are already some Query types that boost
    scores if the terms are closer -- the SpanQuery query types do this.

    You will either have to write your own queryparser (to convert query string
    to the Query object(s) required,
    or you could use e.g. the Surround queryparser supports specifying
    SpanQuery queries via user-entered text:
    http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/

    look in the lucene contrib/surround package for the surround queryparser.

    I've messed about with using SpanQuery classes a bit and people report
    that they are slower - which is true because they use a lot of extra
    info to perform their matching/scoring -- but they are still easily
    sub-second over most
    queries / reasonable sized doc collection
    -- and they only get a lot slower if you have terms that occur very frequently
    in the document collection you are searching.
    So if you are removing stop words etc you shouldn't have too
    many problems efficiency wise.

    hope that helps,

    bec :)
    On 1 June 2010 22:07, Li Li wrote:
    I want to only support boolean or query(as many search engine do). But
    I want to boost document whose terms are closer.
    e.g. the query terms are 'apache lucene'
    doc1 apache has many projects such as lucene
    doc2 The Apache HTTP Server Project is an effort to develop and
    maintain an ...  Lucene is a .....
    I want to give doc1 more boost.
    So I need a place where I can get all position information
    How to extend Similarity to achieve this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 1, '10 at 2:08p
activeJun 2, '10 at 7:11a
posts3
users2
websitelucene.apache.org

2 users in discussion

Li Li: 2 posts Rebecca Watson: 1 post

People

Translate

site design / logo © 2022 Grokbase