FAQ
Hello,

I'm having a hard time implementing / understanding a very simple custom
scoring situation.

I have created my Similarity class for testing which overrides all the
relevant (I think) methods below, returning 1 for all but coord(int, int)
which returns q / maxOverlap so scores are scaled between 0. and 1..

I call writer.setSimilarity(new HashHitSimilarity()) when indexing
and searcher.setSimilarity(new HashHitSimilarity()) when searching.

The similarity is definitely affecting the scoring but not how I expect. I
am looking for a straight average of the hits calculated, i.e.
totalHits for a doc / totalHits in search.

The above score with my test search and index of 6 docs should return the
scores below for all 6 documents in my index:

0.8387096774193549
0.3548387096774194
0.3548387096774194
0.25806451612903225
0.1935483870967742
0.12903225806451613

but the scores appear "stretched" and return these instead though I'm unsure
as to where this "stretching" happens:

0.9078212
0.75977653
0.57541895
0.5670391
0.5223464
0.37150836

public class HashHitSimilarity extends Similarity {

/**
*
*/
private static final long serialVersionUID = 811419737205284733L;

public float tf(float freq) {
return 1f;
}

public float lengthNorm(String fieldName, int numTokens) {
return 1f;
}

public float queryNorm(float sumOfSquaredWeights) {
return 1f;
}

@Override
public float coord(int overlap, int maxOverlap) {
return 1f / (float) maxOverlap;
}

@Override
public float idf(int docFreq, int numDocs) {
return 1f;
}

@Override
public float sloppyFreq(int distance) {
return 0f;
}

}




--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Search Discussions

  • Christopher Tignor at Apr 2, 2010 at 8:28 pm
    This code is in fact working. I had an error in my test case. Things seem
    to work as advertised.

    sorry / thanks -

    C>T>
    On Fri, Apr 2, 2010 at 10:20 AM, Christopher Tignor wrote:

    Hello,

    I'm having a hard time implementing / understanding a very simple custom
    scoring situation.

    I have created my Similarity class for testing which overrides all the
    relevant (I think) methods below, returning 1 for all but coord(int, int)
    which returns q / maxOverlap so scores are scaled between 0. and 1..

    I call writer.setSimilarity(new HashHitSimilarity()) when indexing
    and searcher.setSimilarity(new HashHitSimilarity()) when searching.

    The similarity is definitely affecting the scoring but not how I expect. I
    am looking for a straight average of the hits calculated, i.e.
    totalHits for a doc / totalHits in search.

    The above score with my test search and index of 6 docs should return the
    scores below for all 6 documents in my index:

    0.8387096774193549
    0.3548387096774194
    0.3548387096774194
    0.25806451612903225
    0.1935483870967742
    0.12903225806451613

    but the scores appear "stretched" and return these instead though I'm
    unsure as to where this "stretching" happens:

    0.9078212
    0.75977653
    0.57541895
    0.5670391
    0.5223464
    0.37150836

    public class HashHitSimilarity extends Similarity {

    /**
    *
    */
    private static final long serialVersionUID = 811419737205284733L;

    public float tf(float freq) {
    return 1f;
    }

    public float lengthNorm(String fieldName, int numTokens) {
    return 1f;
    }

    public float queryNorm(float sumOfSquaredWeights) {
    return 1f;
    }

    @Override
    public float coord(int overlap, int maxOverlap) {
    return 1f / (float) maxOverlap;
    }

    @Override
    public float idf(int docFreq, int numDocs) {
    return 1f;
    }

    @Override
    public float sloppyFreq(int distance) {
    return 0f;
    }

    }




    --
    TH!NKMAP

    Christopher Tignor | Senior Software Architect
    155 Spring Street NY, NY 10012
    p.212-285-8600 x385 f.212-285-8999


    --
    TH!NKMAP

    Christopher Tignor | Senior Software Architect
    155 Spring Street NY, NY 10012
    p.212-285-8600 x385 f.212-285-8999

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 2, '10 at 2:20p
activeApr 2, '10 at 8:28p
posts2
users1
websitelucene.apache.org

1 user in discussion

Christopher Tignor: 2 posts

People

Translate

site design / logo © 2022 Grokbase