FAQ
I've noticed that after stress-testing my application (uses Lucene 2.0) for
I while, I have almost 200mb of byte[]s hanging around, the top two
culprits being:

24 x SegmentReader.Norm.bytes = 112mb
2 x SegmentReader.ones = 16mb

The second one isn't a big deal, but I wonder what's the explanation for
the first one?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Yonik Seeley at Dec 12, 2006 at 1:53 am

    On 12/11/06, Eric Jain wrote:
    I've noticed that after stress-testing my application (uses Lucene 2.0) for
    I while, I have almost 200mb of byte[]s hanging around, the top two
    culprits being:

    24 x SegmentReader.Norm.bytes = 112mb
    2 x SegmentReader.ones = 16mb
    Each indexed field has a norm array that is the product of it's
    index-time boost and the length normalization factor. If you don't
    need either, you can omit the norms (as it looks like you already have
    on some fields given that "ones" is the fake norms used in place of
    the "real" norms).

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Eric Jain at Dec 12, 2006 at 2:34 am

    Yonik Seeley wrote:
    On 12/11/06, Eric Jain wrote:
    I've noticed that after stress-testing my application (uses Lucene
    2.0) for
    I while, I have almost 200mb of byte[]s hanging around, the top two
    culprits being:

    24 x SegmentReader.Norm.bytes = 112mb
    2 x SegmentReader.ones = 16mb
    Each indexed field has a norm array that is the product of it's
    index-time boost and the length normalization factor. If you don't
    need either, you can omit the norms (as it looks like you already have
    on some fields given that "ones" is the fake norms used in place of
    the "real" norms).
    Thanks for the explanation.

    Not sure where the fields without norms come from: I use neither
    Field.setOmitNorms nor Index.NO_NORMS anywhere!

    I do want to use document boosting... Is that independent from field
    boosting? The length normalization on the other hand may not be necessary.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Dec 12, 2006 at 2:44 am

    On 12/11/06, Eric Jain wrote:
    I do want to use document boosting... Is that independent from field
    boosting? The length normalization on the other hand may not be necessary.
    There is no real document boost at the index level... it is simply
    multiplied into the boost for every field of that document. So it
    comes down to what fields you want that index-time boost to take
    effect on (as well as length normalization).

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Eric Jain at Dec 12, 2006 at 4:24 am

    Yonik Seeley wrote:
    There is no real document boost at the index level... it is simply
    multiplied into the boost for every field of that document. So it
    comes down to what fields you want that index-time boost to take
    effect on (as well as length normalization).
    Come to think of it, I do have two large indexes that don't really need any
    document boosting, could perhaps save some memory there...

    But what I still don't understand is why the amount of memory that is used
    by SegmentReader.Norm.bytes keeps growing -- at first quite fast to about
    150mb, then slower.

    After startup:

    11 x SegmentReader.Norm.bytes = 17mb


    After searching some indexes once or twice times:

    78 x SegmentReader.Norm.bytes = 79mb
    5 x SegmentReader.ones = 16mb


    After a few dozen queries on one of the indexes:

    58 x SegmentReader.Norm.bytes = 158mb
    5 x SegmentReader.ones = 16mb

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Dec 12, 2006 at 4:27 am

    On 12/11/06, Eric Jain wrote:
    Yonik Seeley wrote:
    There is no real document boost at the index level... it is simply
    multiplied into the boost for every field of that document. So it
    comes down to what fields you want that index-time boost to take
    effect on (as well as length normalization).
    Come to think of it, I do have two large indexes that don't really need any
    document boosting, could perhaps save some memory there...

    But what I still don't understand is why the amount of memory that is used
    by SegmentReader.Norm.bytes keeps growing -- at first quite fast to about
    150mb, then slower.
    It's read on demand, per indexed field.
    So assuming your index is optimized (a single segment), then it
    increases by one byte[] each time you search on a new field.

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Eric Jain at Dec 12, 2006 at 4:39 am

    Yonik Seeley wrote:
    It's read on demand, per indexed field.
    So assuming your index is optimized (a single segment), then it
    increases by one byte[] each time you search on a new field.
    OK, makes sense then. Thanks!

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Dec 12, 2006 at 3:06 am

    I do want to use document boosting... Is that independent from field
    boosting? The length normalization on the other hand may not be
    necessary.

    They "go together" - see "Score Boosting" in
    http://lucene.apache.org/java/docs/scoring.html


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at Dec 12, 2006 at 3:42 am
    Eric, you said you aren't using any Field.Index.NO_NORMS fields, but SegmentReader.ones should only be used if you do use NO_NORMS, so things don't add up here.

    Otis

    ----- Original Message ----
    From: Yonik Seeley <yonik@apache.org>
    To: java-user@lucene.apache.org
    Sent: Monday, December 11, 2006 8:53:15 PM
    Subject: Re: SegmentReader using too much memory?
    On 12/11/06, Eric Jain wrote:
    I've noticed that after stress-testing my application (uses Lucene 2.0) for
    I while, I have almost 200mb of byte[]s hanging around, the top two
    culprits being:

    24 x SegmentReader.Norm.bytes = 112mb
    2 x SegmentReader.ones = 16mb
    Each indexed field has a norm array that is the product of it's
    index-time boost and the length normalization factor. If you don't
    need either, you can omit the norms (as it looks like you already have
    on some fields given that "ones" is the fake norms used in place of
    the "real" norms).

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Dec 12, 2006 at 4:00 am

    On 12/11/06, Otis Gospodnetic wrote:
    Eric, you said you aren't using any Field.Index.NO_NORMS fields, but SegmentReader.ones should only be used if you do use NO_NORMS, so things don't add up here.
    norms(fieldThatDoesntExist) will also return fakeNorms (ones)

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 11, '06 at 11:03p
activeDec 12, '06 at 4:39a
posts10
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase