FAQ
Happy Holidays !

Test case
doc1 : test -- one two three
doc2 : test, one two three
doc3 : one two three

Search query : "one two three" by QueryParser and StandardAnalyzer

Question: why all of three documents have the same score? I really want
the doc3 has higher score because it is an exact match and short. Can
anybody explain this? I will appreciate a lot

Here is my code and its output

public class Test {

public static void main(String[] args){
test();
}

private static void test(){
String[] contents = {"test -- one two three",
"test, one two three",
"one two three"};

Directory dir = new RAMDirectory();
try {
IndexWriter writer = new IndexWriter(dir, new
StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);
for (int i=0; i<contents.length; i++){
Document doc = new Document();
doc.add(new Field("de", contents[i], Field.Store.YES,
Field.Index.ANALYZED));
writer.addDocument(doc);
}
writer.close();

IndexSearcher searcher = new IndexSearcher(dir);
QueryParser parser = new QueryParser(Version.LUCENE_30,"de", new
StandardAnalyzer(Version.LUCENE_30));

Query q = parser.parse("one two three");
TopDocs topDocs = searcher.search(q, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs){
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc.get("de"));
Explanation explan = searcher.explain(q, scoreDoc.doc);
System.out.println(explan.toString());
}

} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}


test -- one two three
0.6168854 = (MATCH) sum of:
0.20562847 = (MATCH) weight(de:one in 0), product of:
0.57735026 = queryWeight(de:one), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:one in 0), product of:
1.0 = tf(termFreq(de:one)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=0)
0.20562847 = (MATCH) weight(de:two in 0), product of:
0.57735026 = queryWeight(de:two), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:two in 0), product of:
1.0 = tf(termFreq(de:two)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=0)
0.20562847 = (MATCH) weight(de:three in 0), product of:
0.57735026 = queryWeight(de:three), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:three in 0), product of:
1.0 = tf(termFreq(de:three)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=0)

test, one two three
0.6168854 = (MATCH) sum of:
0.20562847 = (MATCH) weight(de:one in 1), product of:
0.57735026 = queryWeight(de:one), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:one in 1), product of:
1.0 = tf(termFreq(de:one)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=1)
0.20562847 = (MATCH) weight(de:two in 1), product of:
0.57735026 = queryWeight(de:two), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:two in 1), product of:
1.0 = tf(termFreq(de:two)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=1)
0.20562847 = (MATCH) weight(de:three in 1), product of:
0.57735026 = queryWeight(de:three), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:three in 1), product of:
1.0 = tf(termFreq(de:three)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=1)

one two three
0.6168854 = (MATCH) sum of:
0.20562847 = (MATCH) weight(de:one in 2), product of:
0.57735026 = queryWeight(de:one), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:one in 2), product of:
1.0 = tf(termFreq(de:one)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=2)
0.20562847 = (MATCH) weight(de:two in 2), product of:
0.57735026 = queryWeight(de:two), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:two in 2), product of:
1.0 = tf(termFreq(de:two)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=2)
0.20562847 = (MATCH) weight(de:three in 2), product of:
0.57735026 = queryWeight(de:three), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
0.8105233 = queryNorm
0.35615897 = (MATCH) fieldWeight(de:three in 2), product of:
1.0 = tf(termFreq(de:three)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.5 = fieldNorm(field=de, doc=2)

Best regards,
Qi Li

Search Discussions

  • Ian Lea at Dec 29, 2010 at 10:45 am
    Some of the factors that go in to the score calculation are encoded as
    a byte with inevitable loss of precision. Maybe length is one of
    these and lucene is not differentiating between your 3 and 4 word
    docs. Try indexing a document that is significantly longer than 3 or
    4 words.

    Further reading: http://lucene.apache.org/java/3_0_3/scoring.html, the
    javadocs for Similarity and DefaultSimilarity, whatever Google finds.


    --
    Ian.

    On Tue, Dec 28, 2010 at 8:11 PM, Qi Li wrote:
    Happy Holidays !

    Test case
    doc1 :   test -- one two three
    doc2 :   test, one two three
    doc3 :   one two three

    Search query :  "one two three" by QueryParser and StandardAnalyzer

    Question:  why all of three documents have the same score?  I really want
    the doc3 has higher score because it is an exact match and short.   Can
    anybody explain this?  I will appreciate a lot

    Here is my code and its output

    public class Test {

    public static void main(String[] args){
    test();
    }

    private static void test(){
    String[] contents = {"test -- one two three",
    "test, one two three",
    "one two three"};

    Directory dir = new RAMDirectory();
    try {
    IndexWriter writer = new IndexWriter(dir, new
    StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);
    for (int i=0; i<contents.length; i++){
    Document doc = new Document();
    doc.add(new Field("de", contents[i], Field.Store.YES,
    Field.Index.ANALYZED));
    writer.addDocument(doc);
    }
    writer.close();

    IndexSearcher searcher = new IndexSearcher(dir);
    QueryParser parser = new QueryParser(Version.LUCENE_30,"de", new
    StandardAnalyzer(Version.LUCENE_30));

    Query q = parser.parse("one two three");
    TopDocs topDocs = searcher.search(q, 10);
    for (ScoreDoc scoreDoc : topDocs.scoreDocs){
    Document doc = searcher.doc(scoreDoc.doc);
    System.out.println(doc.get("de"));
    Explanation explan = searcher.explain(q, scoreDoc.doc);
    System.out.println(explan.toString());
    }

    } catch (CorruptIndexException e) {
    e.printStackTrace();
    } catch (LockObtainFailedException e) {
    e.printStackTrace();
    } catch (ParseException e) {
    e.printStackTrace();
    } catch (IOException e) {
    e.printStackTrace();
    }
    }
    }


    test -- one two three
    0.6168854 = (MATCH) sum of:
    0.20562847 = (MATCH) weight(de:one in 0), product of:
    0.57735026 = queryWeight(de:one), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:one in 0), product of:
    1.0 = tf(termFreq(de:one)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=0)
    0.20562847 = (MATCH) weight(de:two in 0), product of:
    0.57735026 = queryWeight(de:two), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:two in 0), product of:
    1.0 = tf(termFreq(de:two)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=0)
    0.20562847 = (MATCH) weight(de:three in 0), product of:
    0.57735026 = queryWeight(de:three), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:three in 0), product of:
    1.0 = tf(termFreq(de:three)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=0)

    test, one two three
    0.6168854 = (MATCH) sum of:
    0.20562847 = (MATCH) weight(de:one in 1), product of:
    0.57735026 = queryWeight(de:one), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:one in 1), product of:
    1.0 = tf(termFreq(de:one)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=1)
    0.20562847 = (MATCH) weight(de:two in 1), product of:
    0.57735026 = queryWeight(de:two), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:two in 1), product of:
    1.0 = tf(termFreq(de:two)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=1)
    0.20562847 = (MATCH) weight(de:three in 1), product of:
    0.57735026 = queryWeight(de:three), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:three in 1), product of:
    1.0 = tf(termFreq(de:three)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=1)

    one two three
    0.6168854 = (MATCH) sum of:
    0.20562847 = (MATCH) weight(de:one in 2), product of:
    0.57735026 = queryWeight(de:one), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:one in 2), product of:
    1.0 = tf(termFreq(de:one)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=2)
    0.20562847 = (MATCH) weight(de:two in 2), product of:
    0.57735026 = queryWeight(de:two), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:two in 2), product of:
    1.0 = tf(termFreq(de:two)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=2)
    0.20562847 = (MATCH) weight(de:three in 2), product of:
    0.57735026 = queryWeight(de:three), product of:
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.8105233 = queryNorm
    0.35615897 = (MATCH) fieldWeight(de:three in 2), product of:
    1.0 = tf(termFreq(de:three)=1)
    0.71231794 = idf(docFreq=3, maxDocs=3)
    0.5 = fieldNorm(field=de, doc=2)

    Best regards,
    Qi Li
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Ahmet Arslan at Dec 29, 2010 at 2:01 pm

    Test case
    doc1 :   test -- one two
    three
    doc2 :   test, one two three
    doc3 :   one two three

    Search query :  "one two three" by QueryParser and
    StandardAnalyzer

    Question:  why all of three documents have the same
    score?
    As Ian said, length norm values of your all documents are the same.
    See Jay Hill's message at http://search-lucene.com/m/Qw6CZpvRjw/




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Qi Li at Dec 29, 2010 at 2:07 pm
    Ahmet and Ian:

    Thanks to both of you very much. I will try the patch.

    Qi
    On Wed, Dec 29, 2010 at 9:00 AM, Ahmet Arslan wrote:

    Test case
    doc1 : test -- one two
    three
    doc2 : test, one two three
    doc3 : one two three

    Search query : "one two three" by QueryParser and
    StandardAnalyzer

    Question: why all of three documents have the same
    score?
    As Ian said, length norm values of your all documents are the same.
    See Jay Hill's message at http://search-lucene.com/m/Qw6CZpvRjw/




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Qi Li at Dec 29, 2010 at 8:43 pm
    I tried to override the default lengthNorm method with the suggestion in
    this link
    https://issues.apache.org/jira/browse/LUCENE-2187.
    But it will not work because not every number of terms from 1 to 10 has an
    unique score.

    Here is my solution, which only works for shorter fields. Welcome any
    critiques or better solutions

    private float[] fs = {1.0f, 0.9f, 0.8f, 0.7f, 0.6f, 0.45f, 0.40f, 0.35f,
    0.30f, 0.20f};

    @Override
    public float lengthNorm(String fieldName, int numTerms){
    if (numTerms < 11 && numTerms > 0){
    return fs[numTerms -1];
    }
    float result = super.lengthNorm(fieldName, numTerms);
    if (result > 0.1875){
    return 0.1875f;
    }
    return result;
    }

    Here is the fieldNorm from 1 to 10
    # of terms lengthNorm
    1 1.0
    2 .875
    3 .75
    4 .625
    5 .5
    6 .4375
    7 .375
    8 .3125
    9 .25
    10 .1875
    Qi


    On Wed, Dec 29, 2010 at 9:00 AM, Ahmet Arslan wrote:

    Test case
    doc1 : test -- one two
    three
    doc2 : test, one two three
    doc3 : one two three

    Search query : "one two three" by QueryParser and
    StandardAnalyzer

    Question: why all of three documents have the same
    score?
    As Ian said, length norm values of your all documents are the same.
    See Jay Hill's message at http://search-lucene.com/m/Qw6CZpvRjw/




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 28, '10 at 8:11p
activeDec 29, '10 at 8:43p
posts5
users3
websitelucene.apache.org

3 users in discussion

Qi Li: 3 posts Ian Lea: 1 post Ahmet Arslan: 1 post

People

Translate

site design / logo © 2023 Grokbase