FAQ
If I have two items in an index:
Terminator 2
Terminator 2: Judgment Day

And I score them against the query +title:(Terminator 2)
they come up with the same score (which makes sense, it just isn't
quite what I want)

Would there be some method or combination of methods in Similarity
that I could easily override to allow me to penalize the second item
because it had "unused terms"?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Chris Hostetter at May 17, 2007 at 11:34 pm
    : Terminator 2
    : Terminator 2: Judgment Day
    :
    : And I score them against the query +title:(Terminator 2)

    : Would there be some method or combination of methods in Similarity
    : that I could easily override to allow me to penalize the second item
    : because it had "unused terms"?

    that's what the DefaultSimilarity does, it uses the (length)norm
    information stored when the documents are indexed to know which one is a
    better match (because it matches on a shorter field)

    I you aren'tseeing that behavior then perhaps you turned omitNorms for
    that field, or perhaps the byte encoding is making the distinction between
    your various terms too small -- overriding the lengthNorm function and
    reindexing might help.



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Daniel Einspanjer at May 18, 2007 at 12:54 am
    Oops. I do indeed have omitNorms turned on. I will re-read the
    documentation on it and look at turning it off.

    Sorry for the bother. :/
    On 5/17/07, Chris Hostetter wrote:

    : Terminator 2
    : Terminator 2: Judgment Day
    :
    : And I score them against the query +title:(Terminator 2)

    : Would there be some method or combination of methods in Similarity
    : that I could easily override to allow me to penalize the second item
    : because it had "unused terms"?

    that's what the DefaultSimilarity does, it uses the (length)norm
    information stored when the documents are indexed to know which one is a
    better match (because it matches on a shorter field)

    I you aren'tseeing that behavior then perhaps you turned omitNorms for
    that field, or perhaps the byte encoding is making the distinction between
    your various terms too small -- overriding the lengthNorm function and
    reindexing might help.



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 17, '07 at 10:47p
activeMay 18, '07 at 12:54a
posts3
users2
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase