FAQ
Dear All,

When using lucene to search documents, the results have a
score based on their relativity to the search term. Inside lucene, the score
percentage is calculated as a percentage of the maximum score achieved.

Assume we are searching for Amr ElAdawy

We get results like the following along with the score:
Omar ElAdawy --> 1.6
Amro ElAdawy --> 1.9
Amir Adawi --> 1.3


With no exact match. Then the percentage will be calculated relative to the
highest score ( despite the fact that it is not exact match). So the
percentage will be:

Omar ElAdawy --> 84.21%
Amro ElAdawy --> 100%
Amir Adawi --> 68.42%


I need to change that, so the percentage will be relative to the exact match
score. And will be something like that (assuming the Matching result score
is 3):

Omar ElAdawy --> 53.33%
Amro ElAdawy --> 63.33%
Amir Adawi --> 43.33%


using lucene 2.9.4

Regards,
Amr ElAdawy




--
View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2183420.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Ahmet Arslan at Jan 3, 2011 at 8:43 am
    It is somehow not recommended to convert scores to percentages.
    http://wiki.apache.org/lucene-java/ScoresAsPercentages

    When using lucene to search documents, the results have a
    score based on their relativity to the search term. Inside
    lucene, the score
    percentage is calculated as a percentage of the maximum
    score achieved.

    Assume we are searching for Amr ElAdawy

    We get results like the following along with the score:
    Omar ElAdawy --> 1.6
    Amro ElAdawy --> 1.9
    Amir Adawi --> 1.3


    With no exact match. Then the percentage will be calculated
    relative to the
    highest score ( despite the fact that it is not exact
    match). So the
    percentage will be:

    Omar ElAdawy --> 84.21%
    Amro ElAdawy --> 100%
    Amir Adawi --> 68.42%


    I need to change that, so the percentage will be relative
    to the exact match
    score. And will be something like that (assuming the
    Matching result score
    is 3):

    Omar ElAdawy --> 53.33%
    Amro ElAdawy --> 63.33%
    Amir Adawi --> 43.33%


    using lucene 2.9.4

    Regards,
    Amr ElAdawy




    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2183420.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Amr ElAdawy at Jan 3, 2011 at 9:38 am
    Hi iorixxx, Thanks a lot for your reply

    I had read the link and I understand the concern, however, the normalization
    is happening inside lucene. Where the normalizing value is the inverse of
    the maxScore.

    I can alter the code to leave the original score, however it is a business
    requirements to view the matching percentage. Also, the absolute score means
    nothing because we wont know the exact matching score.


    I am thinking of alter the core to make the normalization relative to the
    number of terms, considering that the each term will score 1, so the exact
    match should score 3 if there are 3 terms.

    The problem for that, I dont know how to get the number of Clauses from
    Query, nor the number of Terms.
    Also, some exact matching results scored more than 3 !!


    any ideas?
    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2183784.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Ahmet Arslan at Jan 3, 2011 at 12:51 pm

    I had read the link and I understand the concern, however,
    the normalization
    is happening inside lucene. Where the normalizing value is
    the inverse of
    the maxScore.

    I can alter the code to leave the original score, however
    it is a business
    requirements to view the matching percentage. Also, the
    absolute score means
    nothing because we wont know the exact matching score.
    Can you re-phrase your requirement? What do you mean by exact match?

    Query: term1 term2
    Doc1: term1 term2
    Doc2: term1 term2 term3 term4

    Here both doc1 and doc2 are "exact match" or just Doc1?

    If you want to capture the "how many query terms matched" information may be you can use http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Explanation.html

    Or you can modify similarity to ignore all things other than coord, so that your documents are sorted by "how many of the query terms are found in the specified document"
    http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_coord





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Amr ElAdawy at Jan 3, 2011 at 1:21 pm
    Consider the following.

    Query: term1 term2
    Doc1: term1 term2
    Doc2: term1 term2 term3 term4
    Doc3: term1 term1 term3
    Doc4: term3 term4

    For the above documents, Doc1 and Doc2 will b exact match ( as they contain
    all the terms in the search Query). Doc3 is partially match as it contains
    term1 only (we neglect the term frequency tf always 1


    The score percentage ( calculated by Lucene in Hits.java line 133) and will
    be

    Doc1: 100%
    Doc2: 100%
    Doc3: 80%

    This is not a problem at all, the problem occurs when there is no exact
    matching document as following:

    Query: term1 term2
    Doc1: term1 term3
    Doc2: term2 term3 term4
    Doc3: term1 term1 term3
    Doc4: term3 term4


    The score will be calculated as

    Doc1: 100%
    Doc2: 100%
    Doc3: 50%

    You can see that Doc1 and Doc2 got 100% despite that they are not exact
    match. but as they got the highest score, Lucene considers them 100% match.

    This is my problem

    All I need is to make the percentage correct in the second case so it will
    be something as

    Doc1: 50%
    Doc2: 50%
    Doc3: 30%

    I hope I made myself clear.


    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2184613.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Ahmet Arslan at Jan 3, 2011 at 4:34 pm
    So, can we say that if you have something that gives you the "how many query terms matched" info, will that satisfy your requirement?

    Query: term1 term2

    Doc1: term1 term2 => n=2 => %100
    Doc2: term1 term2 term3 term4 => n=2 => %100
    Doc3: term1 term1 term3 => n=1 => %50
    Doc4: term2 term3 term4 => n=1 => %50


    If yes Explanation will you give that info in coord part. For example coord(1/3) means one query term matched and there are total 3 query terms.

    Here is an example Explanation:

    0.013397463 = (MATCH) product of:
    0.040192388 = (MATCH) sum of:
    0.040192388 = (MATCH) weight(pagetext:para in 34930), product of:
    0.46250778 = queryWeight(pagetext:para), product of:
    3.1780937 = idf(docFreq=5546, maxDocs=48977)
    0.14552994 = queryNorm
    0.086901 = (MATCH) fieldWeight(pagetext:para in 34930), product of:
    1.0 = tf(termFreq(pagetext:para)=1)
    3.1780937 = idf(docFreq=5546, maxDocs=48977)
    0.02734375 = fieldNorm(field=pagetext, doc=34930)
    0.33333334 = coord(1/3)



    --- On Mon, 1/3/11, Amr ElAdawy wrote:
    From: Amr ElAdawy <[email protected]>
    Subject: Re: Search Score percentage, Should not be relative to the highest score
    To: [email protected]
    Date: Monday, January 3, 2011, 3:09 PM

    Consider the following.

    Query: term1 term2
    Doc1: term1 term2
    Doc2: term1 term2 term3 term4
    Doc3: term1 term1 term3
    Doc4: term3 term4

    For the above documents, Doc1 and Doc2 will b exact match (
    as they contain
    all the terms in the search Query). Doc3 is partially match
    as it contains
    term1 only (we neglect the term frequency tf always 1


    The score percentage ( calculated by Lucene in Hits.java
    line 133) and will
    be

    Doc1: 100%
    Doc2: 100%
    Doc3:  80%

    This is not a problem at all, the problem occurs when there
    is no exact
    matching document as following:

    Query: term1 term2
    Doc1: term1 term3
    Doc2: term2  term3 term4
    Doc3: term1 term1 term3
    Doc4: term3 term4


    The score will be calculated as

    Doc1: 100%
    Doc2: 100%
    Doc3:  50%

    You can see that Doc1 and Doc2 got 100% despite that they
    are not exact
    match. but as they got the highest score, Lucene considers
    them 100% match.

    This is my problem

    All I need is to make the percentage correct in the second
    case so it will
    be something as

    Doc1: 50%
    Doc2: 50%
    Doc3:  30%

    I hope I made myself clear.


    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2184613.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Amr ElAdawy at Jan 4, 2011 at 3:32 pm
    I am overriding the coord method in my customSimilairty Class and it will be


    return (float)overlap/(float)maxOverlap;


    I'll update you.

    Thanks for your help
    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2192261.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Amr ElAdawy at Jan 5, 2011 at 7:47 am
    Did not work,

    I am using my own Similarity and the coord method is not called, because the
    disableCoord variable is set to true from FuzzyQuery


    public Similarity getSimilarity(Searcher searcher) {
    Similarity result = super.getSimilarity(searcher);
    if (disableCoord) { // disable coord as
    requested
    result = new SimilarityDelegator(result) {
    public float coord(int overlap, int maxOverlap) {
    return 1.0f;
    }
    };
    }
    return result;
    }


    any ideas.
    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2197077.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Ahmet Arslan at Jan 5, 2011 at 5:06 pm

    Did not work,

    I am using my own Similarity and the coord method is not
    called, because the
    disableCoord variable is set to true from FuzzyQuery


    public Similarity getSimilarity(Searcher searcher) {
    Similarity result =
    super.getSimilarity(searcher);
    if (disableCoord) {

    // disable coord as
    requested
    result = new
    SimilarityDelegator(result) {
    public float coord(int
    overlap, int maxOverlap) {
    return 1.0f;
    }
    };
    }
    return result;
    }
    What is the version of Lucene you are using?




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Amr ElAdawy at Jan 5, 2011 at 5:09 pm
    2.9.4
    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2199732.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Amr ElAdawy at Jan 16, 2011 at 7:32 am
    any ideas?
    --
    View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2265633.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 3, '11 at 7:16a
activeJan 16, '11 at 7:32a
posts11
users2
websitelucene.apache.org

2 users in discussion

Amr ElAdawy: 7 posts Ahmet Arslan: 4 posts

People

Translate

site design / logo © 2023 Grokbase