FAQ
Hi,

I am currently using Lucene for indexing. After a index a file, I will use
LUKE to open it and check the index. And there is 1 part that I am curious
about. In Luke, under the Document tab, I randomly select a document and
display it. At the bottom will be 4 columns, Field, ITSVopLBC, Norm and
String Value.

I am wondering, what is Norm for? And where is it created during indexing
time? Which method calculates it?

Could anyone advise me on this? Thanks for the help
--
View this message in context: http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Doron Cohen at Aug 18, 2008 at 4:12 am
    Norms information comes mainly from lengths of documents - allowing the
    search time scoring to take into account the effect of document lengths
    (actually
    field length within a document). In practice, norms stored within the index
    may include
    other information, such as index time boosts - for a document, for a field.
    A single
    byte is stored for each field, - so for this the actual value is compressed.
    At search
    time, norms are loaded into memory, and so consume 1 byte for each document.
    It is possible to disable norms for a field while indexing. This is
    explained
    better in the javadoc for Similarity, and here:
    http://lucene.apache.org/java/2_3_2/scoring.html

    Doron
    On Mon, Aug 18, 2008 at 5:59 AM, blazingwolf7 wrote:


    Hi,

    I am currently using Lucene for indexing. After a index a file, I will use
    LUKE to open it and check the index. And there is 1 part that I am curious
    about. In Luke, under the Document tab, I randomly select a document and
    display it. At the bottom will be 4 columns, Field, ITSVopLBC, Norm and
    String Value.

    I am wondering, what is Norm for? And where is it created during indexing
    time? Which method calculates it?

    Could anyone advise me on this? Thanks for the help
    --
    View this message in context:
    http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Blazingwolf7 at Aug 18, 2008 at 4:29 am
    Thanks for the info. But do you know where this is actually perform in
    Lucene? I mean the method involved, that will calculate the value before
    storing it into the index. I track it to one method known as lengthNorm() in
    DefaultSimilarity.java, but the value is different from what is stored in
    the index


    Doron Cohen-2 wrote:
    Norms information comes mainly from lengths of documents - allowing the
    search time scoring to take into account the effect of document lengths
    (actually
    field length within a document). In practice, norms stored within the
    index
    may include
    other information, such as index time boosts - for a document, for a
    field.
    A single
    byte is stored for each field, - so for this the actual value is
    compressed.
    At search
    time, norms are loaded into memory, and so consume 1 byte for each
    document.
    It is possible to disable norms for a field while indexing. This is
    explained
    better in the javadoc for Similarity, and here:
    http://lucene.apache.org/java/2_3_2/scoring.html

    Doron

    On Mon, Aug 18, 2008 at 5:59 AM, blazingwolf7
    wrote:
    Hi,

    I am currently using Lucene for indexing. After a index a file, I will
    use
    LUKE to open it and check the index. And there is 1 part that I am
    curious
    about. In Luke, under the Document tab, I randomly select a document and
    display it. At the bottom will be 4 columns, Field, ITSVopLBC, Norm and
    String Value.

    I am wondering, what is Norm for? And where is it created during indexing
    time? Which method calculates it?

    Could anyone advise me on this? Thanks for the help
    --
    View this message in context:
    http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Index-of-Lucene-tp19025490p19025890.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Aug 18, 2008 at 7:06 am

    On Mon, Aug 18, 2008 at 7:28 AM, blazingwolf7 wrote:
    Thanks for the info. But do you know where this is actually perform in
    Lucene? I mean the method involved, that will calculate the value before
    storing it into the index. I track it to one method known as lengthNorm()
    in
    DefaultSimilarity.java, but the value is different from what is stored in
    the index
    I believe the answer to your question is in this paragraph:
    http://lucene.apache.org/java/2_3_2/scoring.html#Score%20Boosting
  • Otis Gospodnetic at Aug 18, 2008 at 6:42 pm
    Is that really 1 byte for each document? Not 1 byte for each field of each document?

    Thanks,
    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----
    From: Doron Cohen <cdoronc@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Monday, August 18, 2008 12:11:28 AM
    Subject: Re: Index of Lucene

    Norms information comes mainly from lengths of documents - allowing the
    search time scoring to take into account the effect of document lengths
    (actually
    field length within a document). In practice, norms stored within the index
    may include
    other information, such as index time boosts - for a document, for a field.
    A single
    byte is stored for each field, - so for this the actual value is compressed.
    At search
    time, norms are loaded into memory, and so consume 1 byte for each document.
    It is possible to disable norms for a field while indexing. This is
    explained
    better in the javadoc for Similarity, and here:
    http://lucene.apache.org/java/2_3_2/scoring.html

    Doron
    On Mon, Aug 18, 2008 at 5:59 AM, blazingwolf7 wrote:


    Hi,

    I am currently using Lucene for indexing. After a index a file, I will use
    LUKE to open it and check the index. And there is 1 part that I am curious
    about. In Luke, under the Document tab, I randomly select a document and
    display it. At the bottom will be 4 columns, Field, ITSVopLBC, Norm and
    String Value.

    I am wondering, what is Norm for? And where is it created during indexing
    time? Which method calculates it?

    Could anyone advise me on this? Thanks for the help
    --
    View this message in context:
    http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Aug 23, 2008 at 1:46 pm
    It is in fact 1 byte per field (that stores norms), per document. So
    if you have 7 fields in the doc that store norms, that uses up 7 bytes.

    And, because the storage is non-sparse, even documents which don't
    have a given field X will still use up 1 byte, if field X stores norms.

    Also, beware when disabling norms: you must disable norms for every
    single occurrence of that field in any document in your index. If
    even one document exists that did not disable norms for that field
    then that will "spread" to all other docs, during segment merging.

    Mike

    Otis Gospodnetic wrote:
    Is that really 1 byte for each document? Not 1 byte for each field
    of each document?

    Thanks,
    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


    ----- Original Message ----
    From: Doron Cohen <cdoronc@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Monday, August 18, 2008 12:11:28 AM
    Subject: Re: Index of Lucene

    Norms information comes mainly from lengths of documents - allowing
    the
    search time scoring to take into account the effect of document
    lengths
    (actually
    field length within a document). In practice, norms stored within
    the index
    may include
    other information, such as index time boosts - for a document, for
    a field.
    A single
    byte is stored for each field, - so for this the actual value is
    compressed.
    At search
    time, norms are loaded into memory, and so consume 1 byte for each
    document.
    It is possible to disable norms for a field while indexing. This is
    explained
    better in the javadoc for Similarity, and here:
    http://lucene.apache.org/java/2_3_2/scoring.html

    Doron
    On Mon, Aug 18, 2008 at 5:59 AM, blazingwolf7 wrote:


    Hi,

    I am currently using Lucene for indexing. After a index a file, I
    will use
    LUKE to open it and check the index. And there is 1 part that I am
    curious
    about. In Luke, under the Document tab, I randomly select a
    document and
    display it. At the bottom will be 4 columns, Field, ITSVopLBC,
    Norm and
    String Value.

    I am wondering, what is Norm for? And where is it created during
    indexing
    time? Which method calculates it?

    Could anyone advise me on this? Thanks for the help
    --
    View this message in context:
    http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Nanshi at Jul 10, 2012 at 8:05 pm
    Much more clear explanation than the wiki! Thanks!

    --
    View this message in context: http://lucene.472066.n3.nabble.com/Index-of-Lucene-tp555857p3994239.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 18, '08 at 3:00a
activeJul 10, '12 at 8:05p
posts7
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase