FAQ
Hi All,

I'm not sure how to interpret the result of the toString method of
Explanation. I'm trying to see the values of each component of the
Default Similarity formula for a particular query and a doc. Given
below is a sample of my Explanation output. Many thanks if anyone could
help explain some of the values or direct me to a place that does so.

Explanation = 0.683103 = product of:
1.7077575 = sum of:
0.184242 = weight(Contents:x in 78), product of:
0.13565542 = queryWeight(Contents:x), product of:
2.509232 = idf(docFreq=85)
0.054062527 = queryNorm
1.3581617 = fieldWeight(Contents:x in 78), product of:
1.7320508 = tf(termFreq(Contents:x)=3)
2.509232 = idf(docFreq=85)
0.3125 = fieldNorm(field=Contents, doc=78)
0.184242 = weight(Contents:x in 78), product of:
0.13565542 = queryWeight(Contents:x), product of:
2.509232 = idf(docFreq=85)
0.054062527 = queryNorm
1.3581617 = fieldWeight(Contents:x in 78), product of:
1.7320508 = tf(termFreq(Contents:x)=3)
2.509232 = idf(docFreq=85)
0.3125 = fieldNorm(field=Contents, doc=78)
0.26218253 = weight(Contents:y in 78), product of:
0.16182467 = queryWeight(Contents:y), product of:
2.9932873 = idf(docFreq=52)
0.054062527 = queryNorm
1.6201642 = fieldWeight(Contents:y in 78), product of:
1.7320508 = tf(termFreq(Contents:y)=3)
2.9932873 = idf(docFreq=52)
0.3125 = fieldNorm(field=Contents, doc=78)

--
Eugene

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Yonik Seeley at Mar 2, 2006 at 6:27 pm
    I think Lucene in Action does a good job of it.
    There is also a formula given in the javadoc for DefaultSimilarity
    http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html

    See my comments below (inline)
    On 3/2/06, Eugene wrote:
    Hi All,

    I'm not sure how to interpret the result of the toString method of
    Explanation. I'm trying to see the values of each component of the
    Default Similarity formula for a particular query and a doc. Given
    below is a sample of my Explanation output. Many thanks if anyone could
    help explain some of the values or direct me to a place that does so.

    Explanation = 0.683103 = product of:
    1.7077575 = sum of:
    0.184242 = weight(Contents:x in 78), product of:
    0.13565542 = queryWeight(Contents:x), product of:
    the queryWeight is query-specific... it will have the same value
    for all documents matching the query.
    2.509232 = idf(docFreq=85)
    inverse document frequency... term "x" appears in 85 documents.
    0.054062527 = queryNorm
    queryNorm is a normalization factor... 1/sqrt(sum of all query weights squared)

    If you had a boost, it would also be multiplied into the queryWeight
    at this point.
    1.3581617 = fieldWeight(Contents:x in 78), product of:
    fieldWeight components are document specific.
    1.7320508 = tf(termFreq(Contents:x)=3)
    "x" appears 3 times in the field for this document
    2.509232 = idf(docFreq=85)
    same as the previous idf factor - 85 documents contain "x"
    0.3125 = fieldNorm(field=Contents, doc=78)
    the norm is calculated at index time... it's the length normalization
    factor (1/sqrt(num tokens in this field)) multipled by any on the
    field or document.
    0.184242 = weight(Contents:x in 78), product of:
    0.13565542 = queryWeight(Contents:x), product of:
    2.509232 = idf(docFreq=85)
    0.054062527 = queryNorm
    1.3581617 = fieldWeight(Contents:x in 78), product of:
    1.7320508 = tf(termFreq(Contents:x)=3)
    2.509232 = idf(docFreq=85)
    0.3125 = fieldNorm(field=Contents, doc=78)
    0.26218253 = weight(Contents:y in 78), product of:
    0.16182467 = queryWeight(Contents:y), product of:
    2.9932873 = idf(docFreq=52)
    0.054062527 = queryNorm
    1.6201642 = fieldWeight(Contents:y in 78), product of:
    1.7320508 = tf(termFreq(Contents:y)=3)
    2.9932873 = idf(docFreq=52)
    0.3125 = fieldNorm(field=Contents, doc=78)

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene Ezekiel at Mar 3, 2006 at 2:35 am
    Thanks Yonik for the reply. I got just a couple more questions,

    1) Why does the explanantion print so many times?

    2) Since my query is made up of multiple terms how do I know what term "x"
    is referring to?



    On 3/3/06, Yonik Seeley wrote:

    I think Lucene in Action does a good job of it.
    There is also a formula given in the javadoc for DefaultSimilarity

    http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html

    See my comments below (inline)
    On 3/2/06, Eugene wrote:
    Hi All,

    I'm not sure how to interpret the result of the toString method of
    Explanation. I'm trying to see the values of each component of the
    Default Similarity formula for a particular query and a doc. Given
    below is a sample of my Explanation output. Many thanks if anyone could
    help explain some of the values or direct me to a place that does so.

    Explanation = 0.683103 = product of:
    1.7077575 = sum of:
    0.184242 = weight(Contents:x in 78), product of:
    0.13565542 = queryWeight(Contents:x), product of:
    the queryWeight is query-specific... it will have the same value
    for all documents matching the query.
    2.509232 = idf(docFreq=85)
    inverse document frequency... term "x" appears in 85 documents.
    0.054062527 = queryNorm
    queryNorm is a normalization factor... 1/sqrt(sum of all query weights
    squared)

    If you had a boost, it would also be multiplied into the queryWeight
    at this point.
    1.3581617 = fieldWeight(Contents:x in 78), product of:
    fieldWeight components are document specific.
    1.7320508 = tf(termFreq(Contents:x)=3)
    "x" appears 3 times in the field for this document
    2.509232 = idf(docFreq=85)
    same as the previous idf factor - 85 documents contain "x"
    0.3125 = fieldNorm(field=Contents, doc=78)
    the norm is calculated at index time... it's the length normalization
    factor (1/sqrt(num tokens in this field)) multipled by any on the
    field or document.
    0.184242 = weight(Contents:x in 78), product of:
    0.13565542 = queryWeight(Contents:x), product of:
    2.509232 = idf(docFreq=85)
    0.054062527 = queryNorm
    1.3581617 = fieldWeight(Contents:x in 78), product of:
    1.7320508 = tf(termFreq(Contents:x)=3)
    2.509232 = idf(docFreq=85)
    0.3125 = fieldNorm(field=Contents, doc=78)
    0.26218253 = weight(Contents:y in 78), product of:
    0.16182467 = queryWeight(Contents:y), product of:
    2.9932873 = idf(docFreq=52)
    0.054062527 = queryNorm
    1.6201642 = fieldWeight(Contents:y in 78), product of:
    1.7320508 = tf(termFreq(Contents:y)=3)
    2.9932873 = idf(docFreq=52)
    0.3125 = fieldNorm(field=Contents, doc=78)

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    --
    Regards,
    Eugene
  • Yonik Seeley at Mar 3, 2006 at 4:26 am

    On 3/2/06, Eugene Ezekiel wrote:
    Thanks Yonik for the reply. I got just a couple more questions,

    1) Why does the explanantion print so many times?
    Because it was a compound query with multiple parts to it. It's one explanation
    with multiple parts.
    From the explain output, I would guess the original query was something like
    x x y or Contents:x Contents:x Contents:y
    2) Since my query is made up of multiple terms how do I know what term "x"
    is referring to?
    It's actually a literal "x".

    For example, in my index, if I search for
    solr search lucene when the default field is text, then I get the
    following explain:

    1.1132671 = sum of:
    0.27831677 = weight(text:solr in 84), product of:
    0.57735026 = queryWeight(text:solr), product of:
    3.85647 = idf(docFreq=4)
    0.14970951 = queryNorm
    0.48205876 = fieldWeight(text:solr in 84), product of:
    1.0 = tf(termFreq(text:solr)=1)
    3.85647 = idf(docFreq=4)
    0.125 = fieldNorm(field=text, doc=84)
    0.55663353 = weight(text:search in 84), product of:
    0.57735026 = queryWeight(text:search), product of:
    3.85647 = idf(docFreq=4)
    0.14970951 = queryNorm
    0.9641175 = fieldWeight(text:search in 84), product of:
    2.0 = tf(termFreq(text:search)=4)
    3.85647 = idf(docFreq=4)
    0.125 = fieldNorm(field=text, doc=84)
    0.27831677 = weight(text:lucen in 84), product of:
    0.57735026 = queryWeight(text:lucen), product of:
    3.85647 = idf(docFreq=4)
    0.14970951 = queryNorm
    0.48205876 = fieldWeight(text:lucen in 84), product of:
    1.0 = tf(termFreq(text:lucen)=1)
    3.85647 = idf(docFreq=4)
    0.125 = fieldNorm(field=text, doc=84)

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 3, 2006 at 3:47 pm
    Hi Yonik,

    Thanks a lot, I think i understand how explanation works better now.

    But, there's something weird I noticed. I've a query like:
    "problem formulation each possible x probability p x y find x p x y
    maximized how compute p x y"

    The weird thing is that literals like "problem", "formulation" and other
    words don't show up in explanation only "p" "x" and "y" do show up. And
    I get returned a hit score of 1.0 when the explanation output is 1.3260187:

    Explanation = 1.3260187 = product of:
    2.410943 = sum of:
    .....

    So, basically 2 simple questions:

    1) How do I make all the literals in my query show up in explanation?

    2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?

    Thanks.

    --
    Eugene

    Yonik Seeley wrote:
    On 3/2/06, Eugene Ezekiel wrote:
    Thanks Yonik for the reply. I got just a couple more questions,

    1) Why does the explanantion print so many times?
    Because it was a compound query with multiple parts to it. It's one explanation
    with multiple parts.
    From the explain output, I would guess the original query was something like
    x x y or Contents:x Contents:x Contents:y
    2) Since my query is made up of multiple terms how do I know what term "x"
    is referring to?
    It's actually a literal "x".

    For example, in my index, if I search for
    solr search lucene when the default field is text, then I get the
    following explain:

    1.1132671 = sum of:
    0.27831677 = weight(text:solr in 84), product of:
    0.57735026 = queryWeight(text:solr), product of:
    3.85647 = idf(docFreq=4)
    0.14970951 = queryNorm
    0.48205876 = fieldWeight(text:solr in 84), product of:
    1.0 = tf(termFreq(text:solr)=1)
    3.85647 = idf(docFreq=4)
    0.125 = fieldNorm(field=text, doc=84)
    0.55663353 = weight(text:search in 84), product of:
    0.57735026 = queryWeight(text:search), product of:
    3.85647 = idf(docFreq=4)
    0.14970951 = queryNorm
    0.9641175 = fieldWeight(text:search in 84), product of:
    2.0 = tf(termFreq(text:search)=4)
    3.85647 = idf(docFreq=4)
    0.125 = fieldNorm(field=text, doc=84)
    0.27831677 = weight(text:lucen in 84), product of:
    0.57735026 = queryWeight(text:lucen), product of:
    3.85647 = idf(docFreq=4)
    0.14970951 = queryNorm
    0.48205876 = fieldWeight(text:lucen in 84), product of:
    1.0 = tf(termFreq(text:lucen)=1)
    3.85647 = idf(docFreq=4)
    0.125 = fieldNorm(field=text, doc=84)

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Yonik Seeley at Mar 3, 2006 at 3:59 pm

    On 3/3/06, Eugene wrote:
    Hi Yonik,

    Thanks a lot, I think i understand how explanation works better now.

    But, there's something weird I noticed. I've a query like:
    "problem formulation each possible x probability p x y find x p x y
    maximized how compute p x y"

    The weird thing is that literals like "problem", "formulation" and other
    words don't show up in explanation only "p" "x" and "y" do show up. And
    I get returned a hit score of 1.0 when the explanation output is 1.3260187:

    Explanation = 1.3260187 = product of:
    2.410943 = sum of:
    .....

    So, basically 2 simple questions:

    1) How do I make all the literals in my query show up in explanation?
    Only the literals that match that particular document will show up in
    the explain for that document. So the explain that you showed before
    either belonged to a document that only matched "x" and "y" from all
    the terms in your query, or you have an analyzer problem that is
    causing more terms not to match (try using the same analyzer to query
    that you used to index the document)
    2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?
    The Hits class normalizes scores by dividing all scores by the highest
    score, if that highest score is above 1.0.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 3, 2006 at 4:14 pm
    Hi,

    You mentioned:
    "The Hits class normalizes scores by dividing all scores by the highest
    score, if that highest score is above 1.0."

    Can you explain what highest score are we talking about? I think there's
    only one score for a query and doc right?

    Thanks
    Yonik Seeley wrote:
    On 3/3/06, Eugene wrote:
    Hi Yonik,

    Thanks a lot, I think i understand how explanation works better now.

    But, there's something weird I noticed. I've a query like:
    "problem formulation each possible x probability p x y find x p x y
    maximized how compute p x y"

    The weird thing is that literals like "problem", "formulation" and other
    words don't show up in explanation only "p" "x" and "y" do show up. And
    I get returned a hit score of 1.0 when the explanation output is 1.3260187:

    Explanation = 1.3260187 = product of:
    2.410943 = sum of:
    .....

    So, basically 2 simple questions:

    1) How do I make all the literals in my query show up in explanation?
    Only the literals that match that particular document will show up in
    the explain for that document. So the explain that you showed before
    either belonged to a document that only matched "x" and "y" from all
    the terms in your query, or you have an analyzer problem that is
    causing more terms not to match (try using the same analyzer to query
    that you used to index the document)
    2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?
    The Hits class normalizes scores by dividing all scores by the highest
    score, if that highest score is above 1.0.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 3, 2006 at 4:34 pm
    Ok, i figured out the normalization it was actually on an earlier post
    here:
    http://mail-archives.apache.org/mod_mbox/lucene-java-user/200601.mbox/%[email protected]%3E

    Just one more question: Any way in which i can disable this normalization?

    Thanks for all the help so far.

    --
    Eugene

    Eugene wrote:
    Hi,

    You mentioned:
    "The Hits class normalizes scores by dividing all scores by the highest
    score, if that highest score is above 1.0."

    Can you explain what highest score are we talking about? I think there's
    only one score for a query and doc right?

    Thanks
    Yonik Seeley wrote:
    On 3/3/06, Eugene wrote:
    Hi Yonik,

    Thanks a lot, I think i understand how explanation works better now.

    But, there's something weird I noticed. I've a query like:
    "problem formulation each possible x probability p x y find x p x y
    maximized how compute p x y"

    The weird thing is that literals like "problem", "formulation" and other
    words don't show up in explanation only "p" "x" and "y" do show up. And
    I get returned a hit score of 1.0 when the explanation output is
    1.3260187:

    Explanation = 1.3260187 = product of:
    2.410943 = sum of:
    .....

    So, basically 2 simple questions:

    1) How do I make all the literals in my query show up in explanation?
    Only the literals that match that particular document will show up in
    the explain for that document. So the explain that you showed before
    either belonged to a document that only matched "x" and "y" from all
    the terms in your query, or you have an analyzer problem that is
    causing more terms not to match (try using the same analyzer to query
    that you used to index the document)
    2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?
    The Hits class normalizes scores by dividing all scores by the highest
    score, if that highest score is above 1.0.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Yonik Seeley at Mar 3, 2006 at 4:46 pm

    On 3/3/06, Eugene wrote:
    Just one more question: Any way in which i can disable this normalization?
    We disabled this normalization for in Lucene 1.9 for the "expert"
    level search methods on IndexSearcher. Use the search methods that
    don't return Hits.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 4, 2006 at 1:23 am
    I was looking at the new 1.9 api and can't seem to find this expert mode
    of searching.
    http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector)

    Can you tell where i can find it?

    thanks.

    --
    Eugene
    Yonik Seeley wrote:
    On 3/3/06, Eugene wrote:
    Just one more question: Any way in which i can disable this normalization?
    We disabled this normalization for in Lucene 1.9 for the "expert"
    level search methods on IndexSearcher. Use the search methods that
    don't return Hits.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Chris Hostetter at Mar 4, 2006 at 1:30 am
    : I was looking at the new 1.9 api and can't seem to find this expert mode
    : of searching.

    yonik's refering to all of the methods in the Searcher class that have
    "Expert" in their (javadoc) description.

    : http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector)

    ...that method isn't labeled "expert" but it also uses raw scores
    (HitCollector's have allways recieved the raw scores)


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 5, 2006 at 6:44 pm
    I was wondering if anyone has any idea how i can start to implement my
    own similarity. I wanna use the cosine similarity measure instead. I was
    looking through the past forums posts and saw that quite a few people
    have also discussed this, but no real method of doing it was mentioned.

    Any good links on extending the similarity class? A lot of posts
    discusses David Spencer's "More Like This" but i can;t find this anywhere.

    Thanks.

    Chris Hostetter wrote:
    : I was looking at the new 1.9 api and can't seem to find this expert mode
    : of searching.

    yonik's refering to all of the methods in the Searcher class that have
    "Expert" in their (javadoc) description.

    : http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector)

    ...that method isn't labeled "expert" but it also uses raw scores
    (HitCollector's have allways recieved the raw scores)


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eric Jain at Mar 5, 2006 at 7:25 pm

    Eugene wrote:
    Any good links on extending the similarity class? A lot of posts
    discusses David Spencer's "More Like This" but i can;t find this anywhere.
    The "More Like This" code can be found here:

    http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/similarity/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 6, 2006 at 5:56 am
    Thanks, for posting the "more like this" code. I just began coding my
    cosine similarity and need some help. Can anyone tell me in which file
    are the methods of the DefaultSimilarity methods called?

    For example, looking at the tf method i see that it takes in a float for
    freq instead of int. So i'm curious to see how this method is invoked.

    Thanks.

    --
    Eugene
    Eric Jain wrote:
    Eugene wrote:
    Any good links on extending the similarity class? A lot of posts
    discusses David Spencer's "More Like This" but i can;t find this
    anywhere.
    The "More Like This" code can be found here:

    http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/similarity/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Chris Hostetter at Mar 6, 2006 at 6:16 am
    : cosine similarity and need some help. Can anyone tell me in which file
    : are the methods of the DefaultSimilarity methods called?

    Most of the Similarity methods are called by the various Scorers. A good
    IDE will tell you where they are called (or you could just grep the
    source, that's what I do)

    : For example, looking at the tf method i see that it takes in a float for
    : freq instead of int. So i'm curious to see how this method is invoked.

    I commented on this recently (and no one contested my explanation)...

    http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 6, 2006 at 11:55 am
    Hi,

    Since i'm using a boolean OR query i figured it must be related to the
    BooleanScorer (though there's a more complicated BooleanScorer2 which
    I'm not sure when it's use).

    Looking at the BooleanScorer code it's probably a little over my head as
    I'm still a beginner to Lucene.

    But, I would appreciate if someone could point me to the method where
    the searcher iterates over all query terms and outputs the score. I grep
    both the Searcher classes and the BooleanScorer classes but can't find it.

    Also, I would like to know whether will the sloppyFreq "kick in" if I'm
    just using a Boolean OR query or is this only for phrase queries? And
    how do I disable this so that it'll always be 1.0 without overwriting
    the method?

    Thanks for all the help so far.

    Chris Hostetter wrote:
    : cosine similarity and need some help. Can anyone tell me in which file
    : are the methods of the DefaultSimilarity methods called?

    Most of the Similarity methods are called by the various Scorers. A good
    IDE will tell you where they are called (or you could just grep the
    source, that's what I do)

    : For example, looking at the tf method i see that it takes in a float for
    : freq instead of int. So i'm curious to see how this method is invoked.

    I commented on this recently (and no one contested my explanation)...

    http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Chris Hostetter at Mar 6, 2006 at 6:41 pm
    : Since i'm using a boolean OR query i figured it must be related to the
    : BooleanScorer (though there's a more complicated BooleanScorer2 which
    : I'm not sure when it's use).

    There's actually three possible scorers used: ConjunctionScorer can be
    used if all of the clauses are required. Most of the behavior is driven
    based on wether or not BooleanQuery.setUseScorer14(true) -- by default it
    is false, which means BooleanScorer2 is used.

    : But, I would appreciate if someone could point me to the method where
    : the searcher iterates over all query terms and outputs the score. I grep
    : both the Searcher classes and the BooleanScorer classes but can't find it.

    the searcher doesn't really iterate over query terms, it knows about one
    and only one Weight, and it asks that Weight instance to give it a Scorer
    for the current index, and then it asks that Scorer to iterate over the
    documents and tell it which ones match (using the score(HitCollector)
    method). Internally, the Scorer iterates over the matching documents
    using the to next(), doc()" and score() methods.

    When you are searching for a single Term, the Scorer involved is
    TermScorer; when you are seraching for many terms, the Scorer involved is
    (usually) a BooleanScorer2. BooleanScorers are complicated because they
    are juggling a lot of things at once keeping track of which of the Scorers it
    contains has the lowest "next" doc, but if you look at TermWeight and
    TermScorer you'll get a pretty good idea of how the various similarity
    methods are used.

    : Also, I would like to know whether will the sloppyFreq "kick in" if I'm
    : just using a Boolean OR query or is this only for phrase queries? And
    : how do I disable this so that it'll always be 1.0 without overwriting
    : the method?

    as it says in the javadocs "amount of a sloppy phrase match..." ... only
    for phrase queries (or phrase like queries, ie: SpanNear)

    In general, the only way to change the implimentation of a Similarity
    method used for all queries is to write your own Similarity class, and use
    the Similarity.setDefault or Searcher.setSimilarity methods to "register"
    it.

    If you really only one type of query (or one instance of a query object)
    to get a different similarity, you can override the getSimiliarty() method
    in the Query class in question, and use the SimilarityDelegator to wrap
    the default, and only change the methods you want to change.




    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Eugene at Mar 7, 2006 at 4:55 am
    Thanks, Chris for your clear explanations, it seems there are a lot info
    on using Lucene but info for the internal workings of Lucene is hard to
    come by.

    I got some more questions which I'll ask in-line.


    Chris Hostetter wrote:
    : Since i'm using a boolean OR query i figured it must be related to the
    : BooleanScorer (though there's a more complicated BooleanScorer2 which
    : I'm not sure when it's use).

    There's actually three possible scorers used: ConjunctionScorer can be
    used if all of the clauses are required. Most of the behavior is driven
    based on wether or not BooleanQuery.setUseScorer14(true) -- by default it
    is false, which means BooleanScorer2 is used.
    1) I'm using the default QueryParser to parse and return a query so it's
    a Boolean-OR query. So does this mean it uses the DisjunctionSumScorer
    or something?

    2) Just wondering looking at the API for BooleanQuery i saw this: "Using
    setMinimumNumberShouldMatch will force the use of BooleanWeight2,
    regardless of wether setUseScorer14(true) has been called."
    What is the method setUseScorer14 about?

    --
    Eugene

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Chris Hostetter at Mar 7, 2006 at 7:02 am
    : on using Lucene but info for the internal workings of Lucene is hard to
    : come by.

    As with many OS code bases: the code is the documentation.

    : 1) I'm using the default QueryParser to parse and return a query so it's
    : a Boolean-OR query. So does this mean it uses the DisjunctionSumScorer
    : or something?

    I honestly don't understand all the ways the different Scorers are used
    for BooleanQueries. The thing to keep in mind is that they are all
    optimizations that get choosen based on wether some/all clauses are
    required, wether any clauses are prohibited, etc... If you understand
    what the basic BooleanScorer does, then you understand what all of the
    other Scorers do -- they just go about it in slightly different ways.

    : 2) Just wondering looking at the API for BooleanQuery i saw this: "Using
    : setMinimumNumberShouldMatch will force the use of BooleanWeight2,
    : regardless of wether setUseScorer14(true) has been called."
    : What is the method setUseScorer14 about?

    Hmmm... i guess it's not really documented is it? setUseScorer14(true) is
    just a way to force the old lucene 1.4.x style Boolean Scoring. I don't
    really know why that code was left in, or why you might want to use it.



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 2, '06 at 5:56p
activeMar 7, '06 at 7:02a
posts19
users4
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase