FAQ
I need to define different similarity scores per document field.

For example for field A I want to use Lucene tf.idf score, for the numerical
field B I want to use a different metric (difference between values) and so
on...

thanks

Search Discussions

  • Sujit Pal at Mar 1, 2011 at 8:12 pm
    One way to do this currently is to build a per field similarity wrapper
    (that triggers off the field name). I believe there is some work going
    on with Lucene Similarity that would make it pluggable for this sort of
    stuff, but in the meantime, this is what I did:

    public class MyPerFieldSimilarityWrapper extends Similarity {

    public MyPerFieldSimilarityWrapper() {
    this.defaultSimilarity = new DefaultSimilarity();
    this.fieldSimilarityMap = new HashMap<String,Similarity>();
    this.fieldSimilarityMap.put("fieldA", new FieldASimilarity());
    ...
    }

    @Override
    public float lengthNorm(String fieldName, int numTokens) {
    Similarity sim = fieldSimilarityMap.get(fieldName);
    if (sim == null) {
    return defaultSimilarity.lengthNorm(fieldName, numTokens);
    } else {
    return sim.lengthNorm(fieldName, numTokens);
    }
    }
    // same for scorePayload. For the others, I just delegate
    // to defaultSimilarity (all I really need is scorePayload in
    // my case).
    }

    and in the schema.xml, I just set this class to be the similarity class:
    <similarity class="com.mycompany.MyPerFieldSimilarityWrapper"/>

    hth
    -sujit
    On Tue, 2011-03-01 at 20:41 +0100, Patrick Diviacco wrote:
    I need to define different similarity scores per document field.

    For example for field A I want to use Lucene tf.idf score, for the numerical
    field B I want to use a different metric (difference between values) and so
    on...

    thanks

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Diviacco at Mar 1, 2011 at 10:31 pm
    I see, but I don't get one thing... you are actually customizing only
    normLenght method but not all the other methods that are calculating the
    similarity scores...

    those methods are called and they have the implementation you have in
    DefaultSimilarityClass.. right ?


    On 1 March 2011 21:12, Sujit Pal wrote:

    One way to do this currently is to build a per field similarity wrapper
    (that triggers off the field name). I believe there is some work going
    on with Lucene Similarity that would make it pluggable for this sort of
    stuff, but in the meantime, this is what I did:

    public class MyPerFieldSimilarityWrapper extends Similarity {

    public MyPerFieldSimilarityWrapper() {
    this.defaultSimilarity = new DefaultSimilarity();
    this.fieldSimilarityMap = new HashMap<String,Similarity>();
    this.fieldSimilarityMap.put("fieldA", new FieldASimilarity());
    ...
    }

    @Override
    public float lengthNorm(String fieldName, int numTokens) {
    Similarity sim = fieldSimilarityMap.get(fieldName);
    if (sim == null) {
    return defaultSimilarity.lengthNorm(fieldName, numTokens);
    } else {
    return sim.lengthNorm(fieldName, numTokens);
    }
    }
    // same for scorePayload. For the others, I just delegate
    // to defaultSimilarity (all I really need is scorePayload in
    // my case).
    }

    and in the schema.xml, I just set this class to be the similarity class:
    <similarity class="com.mycompany.MyPerFieldSimilarityWrapper"/>

    hth
    -sujit
    On Tue, 2011-03-01 at 20:41 +0100, Patrick Diviacco wrote:
    I need to define different similarity scores per document field.

    For example for field A I want to use Lucene tf.idf score, for the numerical
    field B I want to use a different metric (difference between values) and so
    on...

    thanks

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sujit Pal at Mar 1, 2011 at 10:47 pm
    Yes, for the other methods (except scorePayload), I just use delegate to
    the corresponding method in DefaultSimilarity. The reason is that I
    don't have a way to trigger off the field name for these others. For me,
    I really only need to distinguish between DefaultSimilarity and
    PayloadSimilarity (which needs to be triggered for certain fields in my
    index), so I overrode the scorePayloads method also in the same Map
    driven way.
    On Tue, 2011-03-01 at 23:28 +0100, Patrick Diviacco wrote:
    I see, but I don't get one thing... you are actually customizing only
    normLenght method but not all the other methods that are calculating
    the similarity scores...


    those methods are called and they have the implementation you have in
    DefaultSimilarityClass.. right ?




    On 1 March 2011 21:12, Sujit Pal wrote:
    One way to do this currently is to build a per field
    similarity wrapper
    (that triggers off the field name). I believe there is some
    work going
    on with Lucene Similarity that would make it pluggable for
    this sort of
    stuff, but in the meantime, this is what I did:

    public class MyPerFieldSimilarityWrapper extends Similarity {

    public MyPerFieldSimilarityWrapper() {
    this.defaultSimilarity = new DefaultSimilarity();
    this.fieldSimilarityMap = new HashMap<String,Similarity>();
    this.fieldSimilarityMap.put("fieldA", new
    FieldASimilarity());
    ...
    }

    @Override
    public float lengthNorm(String fieldName, int numTokens) {
    Similarity sim = fieldSimilarityMap.get(fieldName);
    if (sim == null) {
    return defaultSimilarity.lengthNorm(fieldName,
    numTokens);
    } else {
    return sim.lengthNorm(fieldName, numTokens);
    }
    }
    // same for scorePayload. For the others, I just delegate
    // to defaultSimilarity (all I really need is scorePayload in
    // my case).
    }

    and in the schema.xml, I just set this class to be the
    similarity class:
    <similarity
    class="com.mycompany.MyPerFieldSimilarityWrapper"/>

    hth
    -sujit

    On Tue, 2011-03-01 at 20:41 +0100, Patrick Diviacco wrote:
    I need to define different similarity scores per document field.
    For example for field A I want to use Lucene tf.idf score,
    for the numerical
    field B I want to use a different metric (difference between
    values) and so
    on...

    thanks


    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 1, '11 at 7:42p
activeMar 1, '11 at 10:47p
posts4
users2
websitelucene.apache.org

2 users in discussion

Patrick Diviacco: 2 posts Sujit Pal: 2 posts

People

Translate

site design / logo © 2022 Grokbase