FAQ
Hi All,

I want to change the length normalization calculation specific to my
application. By changing the "*number of terms*" according to my
requirement. The "*StandardTokenizer*" works perfectly for my application,
However, the *number of terms* calculated by the tokenizer is not the
effective number of terms for the application. I have an mechanism to
calculate that value and I need to know how can I apply that value in length
normalization calculations.

Please advice.

Thank you,

Best Regards,
Lahiru.

Search Discussions

  • Ian Lea at Jun 13, 2011 at 9:40 am
    org.apache.lucene.search.Similarity would be the place to look,
    specifically computeNorm(String field, FieldInvertState state). There
    is comprehensive info in the javadocs. Note that values are
    calculated at indexing and stored in the index encoded, with some loss
    of precision.


    --
    Ian.
    On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon wrote:
    Hi All,

    I want to change the length normalization calculation specific to my
    application. By changing the "*number of terms*" according to my
    requirement. The "*StandardTokenizer*" works perfectly for my application,
    However, the *number of terms* calculated by the tokenizer is not the
    effective number of terms for the application. I have an mechanism to
    calculate that value and I need to know how can I apply that value in length
    normalization calculations.

    Please advice.

    Thank you,

    Best Regards,
    Lahiru.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Lahiru Samarakoon at Jun 13, 2011 at 10:33 am
    HI Ian,

    Thank you very much for the reply.

    The application calls the *writer.addDocument(d);* method and in this
    process the *lengthNorm(String fieldName, int numTerms)* method is called.
    I can extend the *DefaultSimilarity* class and override the
    *lengthNorm*method, but how can I explicitly specify the
    *numTerms* value?

    In my application, numTerms = (Analyzed Length of the field content) -
    (app specific calculated value)

    (Analyzed Length of the field content) = original numTerms value calculated
    in the *computeNorm*, which is known.

    Does *computeNorm* method is called for every field or is it only called for
    analyzed fields?

    The order we call *addDocument* and the order the *computeNorm *method is
    called is the same ?

    Is there is a possibility that I can access the *Document* object inside the
    *Similiarity* class ?

    Regards,
    Lahiru
    On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea wrote:

    org.apache.lucene.search.Similarity would be the place to look,
    specifically computeNorm(String field, FieldInvertState state). There
    is comprehensive info in the javadocs. Note that values are
    calculated at indexing and stored in the index encoded, with some loss
    of precision.


    --
    Ian.
    On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon wrote:
    Hi All,

    I want to change the length normalization calculation specific to my
    application. By changing the "*number of terms*" according to my
    requirement. The "*StandardTokenizer*" works perfectly for my
    application,
    However, the *number of terms* calculated by the tokenizer is not the
    effective number of terms for the application. I have an mechanism to
    calculate that value and I need to know how can I apply that value in length
    normalization calculations.

    Please advice.

    Thank you,

    Best Regards,
    Lahiru.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at Jun 13, 2011 at 1:46 pm
    This is getting beyond my level of expertise, but I'll have a go at
    your questions. Hopefully someone better informed will step in with
    corrections or confirmation.
    ...
    The application calls the *writer.addDocument(d);* method and in this
    process the *lengthNorm(String fieldName, int numTerms)*  method is called.
    I can extend the *DefaultSimilarity* class and override the
    *lengthNorm*method, but how can I explicitly specify the
    *numTerms* value?
    I don't know that you can, but you don't have to use the value passed in.
    ...
    Does *computeNorm* method is called for every field or is it only called for
    analyzed fields?
    All indexed fields, at a guess. Which can be analyzed or not.
    The order we call *addDocument* and the order the *computeNorm *method is
    called is the same ? Probably.
    Is there is a possibility that I can access the *Document* object inside the
    *Similiarity* class ?
    Not that I know of via API calls. If you had your own Similarity
    implementation, and methods are called in the order you expect, you
    could add a setDoc(Document) method and/or a setCalcValue(n) method
    and use them as you wished in your custom computeNorm() or
    lengthNorm() code.


    --
    Ian.

    On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea wrote:

    org.apache.lucene.search.Similarity would be the place to look,
    specifically computeNorm(String field, FieldInvertState state).  There
    is comprehensive info in the javadocs.  Note that values are
    calculated at indexing and stored in the index encoded, with some loss
    of precision.


    --
    Ian.

    On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <lahiruts@gmail.com>
    wrote:
    Hi All,

    I want to change the length normalization calculation specific to my
    application. By changing the "*number of terms*" according to my
    requirement. The "*StandardTokenizer*" works perfectly for my
    application,
    However, the *number of terms* calculated by the tokenizer is not the
    effective number of terms for the application. I have an mechanism to
    calculate that value and I need to know how can I apply that value in length
    normalization calculations.

    Please advice.

    Thank you,

    Best Regards,
    Lahiru.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Lahiru Samarakoon at Jun 14, 2011 at 5:12 am
    Hi Ian,

    The order is right and your method is working for me.

    Thanks [?]

    Lahiru
    On Mon, Jun 13, 2011 at 7:15 PM, Ian Lea wrote:

    This is getting beyond my level of expertise, but I'll have a go at
    your questions. Hopefully someone better informed will step in with
    corrections or confirmation.
    ...
    The application calls the *writer.addDocument(d);* method and in this
    process the *lengthNorm(String fieldName, int numTerms)* method is called.
    I can extend the *DefaultSimilarity* class and override the
    *lengthNorm*method, but how can I explicitly specify the
    *numTerms* value?
    I don't know that you can, but you don't have to use the value passed in.
    ...
    Does *computeNorm* method is called for every field or is it only called for
    analyzed fields?
    All indexed fields, at a guess. Which can be analyzed or not.
    The order we call *addDocument* and the order the *computeNorm *method is
    called is the same ? Probably.
    Is there is a possibility that I can access the *Document* object inside the
    *Similiarity* class ?
    Not that I know of via API calls. If you had your own Similarity
    implementation, and methods are called in the order you expect, you
    could add a setDoc(Document) method and/or a setCalcValue(n) method
    and use them as you wished in your custom computeNorm() or
    lengthNorm() code.


    --
    Ian.

    On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea wrote:

    org.apache.lucene.search.Similarity would be the place to look,
    specifically computeNorm(String field, FieldInvertState state). There
    is comprehensive info in the javadocs. Note that values are
    calculated at indexing and stored in the index encoded, with some loss
    of precision.


    --
    Ian.

    On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <lahiruts@gmail.com>
    wrote:
    Hi All,

    I want to change the length normalization calculation specific to my
    application. By changing the "*number of terms*" according to my
    requirement. The "*StandardTokenizer*" works perfectly for my
    application,
    However, the *number of terms* calculated by the tokenizer is not the
    effective number of terms for the application. I have an mechanism to
    calculate that value and I need to know how can I apply that value in length
    normalization calculations.

    Please advice.

    Thank you,

    Best Regards,
    Lahiru.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 13, '11 at 6:31a
activeJun 14, '11 at 5:12a
posts5
users2
websitelucene.apache.org

2 users in discussion

Lahiru Samarakoon: 3 posts Ian Lea: 2 posts

People

Translate

site design / logo © 2022 Grokbase