I'm having a hard time implementing / understanding a very simple custom
scoring situation.
I have created my Similarity class for testing which overrides all the
relevant (I think) methods below, returning 1 for all but coord(int, int)
which returns q / maxOverlap so scores are scaled between 0. and 1..
I call writer.setSimilarity(new HashHitSimilarity()) when indexing
and searcher.setSimilarity(new HashHitSimilarity()) when searching.
The similarity is definitely affecting the scoring but not how I expect. I
am looking for a straight average of the hits calculated, i.e.
totalHits for a doc / totalHits in search.
The above score with my test search and index of 6 docs should return the
scores below for all 6 documents in my index:
0.8387096774193549
0.3548387096774194
0.3548387096774194
0.25806451612903225
0.1935483870967742
0.12903225806451613
but the scores appear "stretched" and return these instead though I'm unsure
as to where this "stretching" happens:
0.9078212
0.75977653
0.57541895
0.5670391
0.5223464
0.37150836
public class HashHitSimilarity extends Similarity {
/**
*
*/
private static final long serialVersionUID = 811419737205284733L;
public float tf(float freq) {
return 1f;
}
public float lengthNorm(String fieldName, int numTokens) {
return 1f;
}
public float queryNorm(float sumOfSquaredWeights) {
return 1f;
}
@Override
public float coord(int overlap, int maxOverlap) {
return 1f / (float) maxOverlap;
}
@Override
public float idf(int docFreq, int numDocs) {
return 1f;
}
@Override
public float sloppyFreq(int distance) {
return 0f;
}
}
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999