FAQ
I've just done a very simple, single term query against a 4.10 system
and a 5.5 system, each with much the same data.

The score for the 4.10 system was essentially made up of the field
weight, which is:
    score = tf * idf

Whereas, in the 5.5 system, there is an additional "query weight", which
is idf * query norm. If query norm is 1, then the final score is now:
   score = query_weight * field_weight
           = ( idf * 1 ) * (tf * idf)
           = tf * idf^2

Can anyone explain why this new "query weight" element has appeared in
our scores somewhere between 4.10 and 5.5?

Thanks!

Upayavira

4.10 score ========================================================
       "2937439": {
         "match": true,
         "value": 5.5993805,
         "description": "weight(description:obama in 394012)
         [DefaultSimilarity], result of:",
         "details": [
           {
             "match": true,
             "value": 5.5993805,
             "description": "fieldWeight in 394012, product of:",
             "details": [
               {
                 "match": true,
                 "value": 1,
                 "description": "tf(freq=1.0), with freq of:",
                 "details": [
                   {
                     "match": true,
                     "value": 1,
                     "description": "termFreq=1.0"
                   }
                 ]
               },
               {
                 "match": true,
                 "value": 5.5993805,
                 "description": "idf(docFreq=56010, maxDocs=5568765)"
               },
               {
                 "match": true,
                 "value": 1,
                 "description": "fieldNorm(doc=394012)"
               }
             ]
           }
         ]
5.5 score ========================================================
       "2502281":{
         "match":true,
         "value":28.51136,
         "description":"weight(description:obama in 43472) [], result
         of:",
         "details":[{
             "match":true,
             "value":28.51136,
             "description":"score(doc=43472,freq=1.0), product of:",
             "details":[{
                 "match":true,
                 "value":5.339603,
                 "description":"queryWeight, product of:",
                 "details":[{
                     "match":true,
                     "value":5.339603,
                     "description":"idf(docFreq=31905,
                     maxDocs=2446459)"},
                   {
                     "match":true,
                     "value":1.0,
                     "description":"queryNorm"}]},
               {
                 "match":true,
                 "value":5.339603,
                 "description":"fieldWeight in 43472, product of:",
                 "details":[{
                     "match":true,
                     "value":1.0,
                     "description":"tf(freq=1.0), with freq of:",
                     "details":[{
                         "match":true,
                         "value":1.0,
                         "description":"termFreq=1.0"}]},
                   {
                     "match":true,
                     "value":5.339603,
                     "description":"idf(docFreq=31905,
                     maxDocs=2446459)"},
                   {
                     "match":true,
                     "value":1.0,
                     "description":"fieldNorm(doc=43472)"}]}]}]},

Search Discussions

  • Ahmet Arslan at Jun 10, 2016 at 12:40 am
    Hi,

    I wondered the same before and failed to decipher TFIDFSimilarity.
    Scoring looks like tf*idf*idf to me.

    I appreciate someone who will shed some light on this.

    Thanks,
    Ahmet



    On Friday, June 10, 2016 12:37 AM, Upayavira wrote:
    I've just done a very simple, single term query against a 4.10 system
    and a 5.5 system, each with much the same data.

    The score for the 4.10 system was essentially made up of the field
    weight, which is:
        score = tf * idf

    Whereas, in the 5.5 system, there is an additional "query weight", which
    is idf * query norm. If query norm is 1, then the final score is now:
       score = query_weight * field_weight
               = ( idf * 1 ) * (tf * idf)
               = tf * idf^2

    Can anyone explain why this new "query weight" element has appeared in
    our scores somewhere between 4.10 and 5.5?

    Thanks!

    Upayavira

    4.10 score ========================================================
           "2937439": {
             "match": true,
             "value": 5.5993805,
             "description": "weight(description:obama in 394012)
             [DefaultSimilarity], result of:",
             "details": [
               {
                 "match": true,
                 "value": 5.5993805,
                 "description": "fieldWeight in 394012, product of:",
                 "details": [
                   {
                     "match": true,
                     "value": 1,
                     "description": "tf(freq=1.0), with freq of:",
                     "details": [
                       {
                         "match": true,
                         "value": 1,
                         "description": "termFreq=1.0"
                       }
                     ]
                   },
                   {
                     "match": true,
                     "value": 5.5993805,
                     "description": "idf(docFreq=56010, maxDocs=5568765)"
                   },
                   {
                     "match": true,
                     "value": 1,
                     "description": "fieldNorm(doc=394012)"
                   }
                 ]
               }
             ]
    5.5 score ========================================================
           "2502281":{
             "match":true,
             "value":28.51136,
             "description":"weight(description:obama in 43472) [], result
             of:",
             "details":[{
                 "match":true,
                 "value":28.51136,
                 "description":"score(doc=43472,freq=1.0), product of:",
                 "details":[{
                     "match":true,
                     "value":5.339603,
                     "description":"queryWeight, product of:",
                     "details":[{
                         "match":true,
                         "value":5.339603,
                         "description":"idf(docFreq=31905,
                         maxDocs=2446459)"},
                       {
                         "match":true,
                         "value":1.0,
                         "description":"queryNorm"}]},
                   {
                     "match":true,
                     "value":5.339603,
                     "description":"fieldWeight in 43472, product of:",
                     "details":[{
                         "match":true,
                         "value":1.0,
                         "description":"tf(freq=1.0), with freq of:",
                         "details":[{
                             "match":true,
                             "value":1.0,
                             "description":"termFreq=1.0"}]},
                       {
                         "match":true,
                         "value":5.339603,
                         "description":"idf(docFreq=31905,
                         maxDocs=2446459)"},
                       {
                         "match":true,
                         "value":1.0,
                         "description":"fieldNorm(doc=43472)"}]}]}]},
  • Upayavira at Jun 10, 2016 at 8:29 am
    Tracked it down to this ticket:

    https://issues.apache.org/jira/browse/LUCENE-6590

    which changed the implementation of normalize() in
    org.apache.lucene.search.similarities.TFIDFSimilarity.

    I've asked for comment on that ticket.

    Upayavira
    On Fri, 10 Jun 2016, at 01:39 AM, Ahmet Arslan wrote:
    Hi,

    I wondered the same before and failed to decipher TFIDFSimilarity.
    Scoring looks like tf*idf*idf to me.

    I appreciate someone who will shed some light on this.

    Thanks,
    Ahmet



    On Friday, June 10, 2016 12:37 AM, Upayavira wrote:
    I've just done a very simple, single term query against a 4.10 system
    and a 5.5 system, each with much the same data.

    The score for the 4.10 system was essentially made up of the field
    weight, which is:
    score = tf * idf

    Whereas, in the 5.5 system, there is an additional "query weight", which
    is idf * query norm. If query norm is 1, then the final score is now:
    score = query_weight * field_weight
    = ( idf * 1 ) * (tf * idf)
    = tf * idf^2

    Can anyone explain why this new "query weight" element has appeared in
    our scores somewhere between 4.10 and 5.5?

    Thanks!

    Upayavira

    4.10 score ========================================================
    "2937439": {
    "match": true,
    "value": 5.5993805,
    "description": "weight(description:obama in 394012)
    [DefaultSimilarity], result of:",
    "details": [
    {
    "match": true,
    "value": 5.5993805,
    "description": "fieldWeight in 394012, product of:",
    "details": [
    {
    "match": true,
    "value": 1,
    "description": "tf(freq=1.0), with freq of:",
    "details": [
    {
    "match": true,
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "match": true,
    "value": 5.5993805,
    "description": "idf(docFreq=56010, maxDocs=5568765)"
    },
    {
    "match": true,
    "value": 1,
    "description": "fieldNorm(doc=394012)"
    }
    ]
    }
    ]
    5.5 score ========================================================
    "2502281":{
    "match":true,
    "value":28.51136,
    "description":"weight(description:obama in 43472) [], result
    of:",
    "details":[{
    "match":true,
    "value":28.51136,
    "description":"score(doc=43472,freq=1.0), product of:",
    "details":[{
    "match":true,
    "value":5.339603,
    "description":"queryWeight, product of:",
    "details":[{
    "match":true,
    "value":5.339603,
    "description":"idf(docFreq=31905,
    maxDocs=2446459)"},
    {
    "match":true,
    "value":1.0,
    "description":"queryNorm"}]},
    {
    "match":true,
    "value":5.339603,
    "description":"fieldWeight in 43472, product of:",
    "details":[{
    "match":true,
    "value":1.0,
    "description":"tf(freq=1.0), with freq of:",
    "details":[{
    "match":true,
    "value":1.0,
    "description":"termFreq=1.0"}]},
    {
    "match":true,
    "value":5.339603,
    "description":"idf(docFreq=31905,
    maxDocs=2446459)"},
    {
    "match":true,
    "value":1.0,
    "description":"fieldNorm(doc=43472)"}]}]}]},

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-user @
categorieslucene
postedJun 9, '16 at 9:37p
activeJun 10, '16 at 8:29a
posts3
users2
websitelucene.apache.org...

2 users in discussion

Upayavira: 2 posts Ahmet Arslan: 1 post

People

Translate

site design / logo © 2019 Grokbase