FAQ
Hi,

I would like to override default similarity's computeNorm to work with
a different field, other than the query field.

Here is the DefaultSimilarity implementation:

@Override
public float computeNorm(String field, FieldInvertState state) {
final int numTerms;
if (discountOverlaps)
numTerms = state.getLength() - state.getNumOverlap();
else
numTerms = state.getLength();
return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
}

any ideas how to do that?

Thanks,

Tsvika

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ryan Aylward at Feb 1, 2011 at 6:51 pm
    I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up with was to add a thread local to set the field and then check the thread local within the tf function. Here's the tf function...

    public float tf(float freq) {

    // Get the value of the thread local...
    String field = FieldThreadLocal.getField();

    if ("fieldA".equals(field)) {
    // always return 1 for field A
    return 1;
    } else {
    // otherwise, use the normal tf function
    return super.tf(freq);
    }
    }

    tf() is used during scoring so I had to override the TermQuery (and TermWeight and TermScorer) to be able to set and clear the thread local at the appropriate times. This is a pretty ugly hack, but I couldn't find another way to make this work.

    computeNorm() is calculated at index creation time but you try to do something similar.

    Would be curious if other people had a better suggestion as to how to do this.

    -----Original Message-----
    From: Tsvika Rabkin
    Sent: Tuesday, February 01, 2011 5:27 AM
    To: java-user@lucene.apache.org
    Subject: Using different field when overriding computeNorm

    Hi,

    I would like to override default similarity's computeNorm to work with
    a different field, other than the query field.

    Here is the DefaultSimilarity implementation:

    @Override
    public float computeNorm(String field, FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
    numTerms = state.getLength() - state.getNumOverlap();
    else
    numTerms = state.getLength();
    return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
    }

    any ideas how to do that?

    Thanks,

    Tsvika

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Robert Muir at Feb 1, 2011 at 7:10 pm

    On Tue, Feb 1, 2011 at 1:51 PM, Ryan Aylward wrote:
    I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up
    in Lucene's trunk, Similarity can now be controlled on a per-field
    basis, see https://issues.apache.org/jira/browse/LUCENE-2236

    The only exceptions are things like coord() which apply to e.g.
    BooleanQuery (which might span multiple fields) and remain top-level
    in the new SimilarityProvider.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ryan Aylward at Feb 3, 2011 at 8:27 pm
    This is great. Is there a target of when 4.0 will be released?

    -----Original Message-----
    From: Robert Muir
    Sent: Tuesday, February 01, 2011 11:10 AM
    To: java-user@lucene.apache.org
    Subject: Re: Using different field when overriding computeNorm
    On Tue, Feb 1, 2011 at 1:51 PM, Ryan Aylward wrote:
    I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up
    in Lucene's trunk, Similarity can now be controlled on a per-field
    basis, see https://issues.apache.org/jira/browse/LUCENE-2236

    The only exceptions are things like coord() which apply to e.g.
    BooleanQuery (which might span multiple fields) and remain top-level
    in the new SimilarityProvider.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Robert Muir at Feb 3, 2011 at 10:24 pm

    On Thu, Feb 3, 2011 at 3:27 PM, Ryan Aylward wrote:
    This is great. Is there a target of when 4.0 will be released?
    Unfortunately I think its quite a ways away: there are branches for
    major features such as per-document payloads, realtime search, modern
    index compression algorithms, and a variety of other exciting things
    in the works. As far as releases go, currently we are working towards
    release 3.1, which is the next stable minor release upgrade from 3.0.

    It might be technically possible to backport this feature (per-field
    similarity) to the 3.x codebase while still keeping backwards
    compatibility, but I'm worried about breaking backwards compatibility
    in subtle ways due to some gremlins in the code... we fixed most of
    these gremlins in trunk but they are still available and deprecated in
    3.1 (example: https://issues.apache.org/jira/browse/LUCENE-2828).

    So, at the moment having this feature be something that has to wait
    until 4.0 is the safest option in my opinion... but I feel your pain
    here when trying to customize the scoring system...

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 1, '11 at 1:28p
activeFeb 3, '11 at 10:24p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase