FAQ
How can I convert this Similariity method to use 3.1 (currently using
3.0.3), I understand I have to replace lengthNorm() wuth computerNorm()
, but fieldlName is not a provided parameter in computerNorm() and
FieldInvertState does not contain the fieldname either. I need the field
because I only want to make this adjustment for a particular field in
the index not all fields, and I don't believe there is anything in the
api at the moment to let you set similarity at a field level.

public class MusicbrainzSimilarity extends DefaultSimilarity {

/**
* Calculates a value which is inversely proportional to the number
of terms in the field. When multiple
* aliases are added to an artist (or label) it is seen as one
field, so artists with many aliases can be
* disadvantaged against when the matching alias is radically
different to other aliases.
*
* @return score component
*/
@Override
public float lengthNorm(String fieldName, int numTerms) {

if (fieldName.equals("alias")) {
return 0.578f; //Same result as normal calc if field had
three terms the most common scenario
} else {
return super.lengthNorm(fieldName, numTerms);
}
}

/**
* This method calculates a value based on how many times the
search term was found in the field. Because
* we have only short fields the only real case (apart from rare
exceptions like Duran Duran Duran) whereby
* the term term is found more than twice would be when
* a search term matches multiples aliases, to remove the bias this
gives towards artists/labels with
* many aliases we limit the value to what would be returned for a
two term match.
*
* Note: would prefer to do this just for alias field, but the
field is not passed as a parameter.
* @param freq
* @return score component
*/
@Override
public float tf(float freq) {
if (freq > 2.0f) {
return 1.41f; //Same result as if matched term twice

} else {
return super.tf(freq);
}
}
}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Robert Muir at May 3, 2011 at 2:07 pm

    On Tue, May 3, 2011 at 9:57 AM, Paul Taylor wrote:
    How can I convert this Similariity method to use 3.1 (currently using
    3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
    but fieldlName is not a provided parameter in computerNorm() and
    FieldInvertState does not contain the fieldname either.
    Hi, I think you made a mistake, it does take fieldname:

    http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/Similarity.html#computeNorm(java.lang.String,
    org.apache.lucene.index.FieldInvertState)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Paul Taylor at May 3, 2011 at 2:30 pm

    On 03/05/2011 15:06, Robert Muir wrote:
    On Tue, May 3, 2011 at 9:57 AM, Paul Taylorwrote:
    How can I convert this Similariity method to use 3.1 (currently using
    3.0.3), I understand I have to replace lengthNorm() wuth computerNorm() ,
    but fieldlName is not a provided parameter in computerNorm() and
    FieldInvertState does not contain the fieldname either.
    Hi, I think you made a mistake, it does take fieldname:

    http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/Similarity.html#computeNorm(java.lang.String,
    org.apache.lucene.index.FieldInvertState)
    Doh thanks, not having a good few days, trying to do to much at once.

    I assume this would be the correct way to fix the code for 3.1.0

    public float computeNorm(String field, FieldInvertState state) {


    //This will match both artist and label aliases and is
    applicable to both, didn't use the constant
    //ArtistIndexField.ALIAS because that would be confusing
    if (field.equals("alias")) {
    return state.getBoost() * 0.578f; //Same result as normal
    calc if field had three terms the most common scenario
    }
    else
    {
    return super.computeNorm(field,state);
    }
    }


    Paul

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Robert Muir at May 3, 2011 at 2:50 pm

    On Tue, May 3, 2011 at 10:29 AM, Paul Taylor wrote:
    I assume this would be the correct way to fix the code for 3.1.0
    Yes, thats correct.
    public float computeNorm(String field, FieldInvertState state) {


    //This will match both artist and label aliases and is applicable to
    both, didn't use the constant
    //ArtistIndexField.ALIAS because that would be confusing
    if (field.equals("alias")) {
    return state.getBoost() * 0.578f; //Same result as normal calc if
    field had three terms the most common scenario
    }
    else
    {
    return super.computeNorm(field,state);
    }
    }


    Paul
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 3, '11 at 1:57p
activeMay 3, '11 at 2:50p
posts4
users2
websitelucene.apache.org

2 users in discussion

Robert Muir: 2 posts Paul Taylor: 2 posts

People

Translate

site design / logo © 2022 Grokbase