FAQ
I have just joined this user group, but I probably will be asking questions / contributing for a while now as I am starting to work on a product which will use Lucene exclusively.

Still in the designing phase, and I see that we need to manage several user / application specific configurations and I am exploring the idea of storing the configuration information also in the Index, may be create a separate index just for the configuration, because each module of the application will have access to Lucene classes.

I know technically this can be done, but are there any best practises which discourage this?

Thanks in advance.
-Pradeep

Search Discussions

  • Daniel Noll at Feb 1, 2006 at 2:43 am

    Pradeep Sharma wrote:
    Still in the designing phase, and I see that we need to manage several
    user / application specific configurations and I am exploring the idea
    of storing the configuration information also in the Index, may be
    create a separate index just for the configuration, because each
    module of the application will have access to Lucene classes.

    I know technically this can be done, but are there any best practises
    which discourage this?
    This would make sense only if you're planning to do some kind of text
    search over the configuration. Otherwise, you're better off just
    keeping configuration somewhere else.

    Updating a text index when a configuration element changes is a less
    than pretty operation, whereas using the Preferences API is reasonably sane.

    Daniel

    --
    Daniel Noll

    Nuix Australia Pty Ltd
    Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
    Phone: (02) 9280 0699
    Fax: (02) 9212 6902

    This message is intended only for the named recipient. If you are not
    the intended recipient you are notified that disclosing, copying,
    distributing or taking any action in reliance on the contents of this
    message or attachment is strictly prohibited.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leon Chaddock at Feb 1, 2006 at 10:22 am
    Hi All,

    We have a lucene index of over 10 000 000 docs at this time.
    When we try and run a search we get
    java.lang.OutOfMemoryError: Java heap space

    We have tried setting the xmx settings to 1gb but to no avail (the box has
    4gb of memory available) . IS there any guidance on handling memory or has
    anyone had similar problems before that could help?

    Many thanks

    Leon

    ----- Original Message -----
    From: "Pradeep Sharma" <pradeep@danicorp.com>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, February 01, 2006 2:03 AM
    Subject: Greetings and my first question - Is it a good practise to store
    application configuration in Lucene




    I have just joined this user group, but I probably will be asking questions
    / contributing for a while now as I am starting to work on a product which
    will use Lucene exclusively.

    Still in the designing phase, and I see that we need to manage several user
    / application specific configurations and I am exploring the idea of storing
    the configuration information also in the Index, may be create a separate
    index just for the configuration, because each module of the application
    will have access to Lucene classes.

    I know technically this can be done, but are there any best practises which
    discourage this?

    Thanks in advance.
    -Pradeep



    --------------------------------------------------------------------------------


    No virus found in this incoming message.
    Checked by AVG Free Edition.
    Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date: 30/01/2006


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Volodymyr Bychkoviak at Feb 1, 2006 at 12:16 pm
    As long as you have many document in index there can many unique terms
    in index.
    Every 128th term(by default) is written to term info index for faster
    term lookup.
    This info is loaded entirely to memory when searching so this can
    increase memory usage.
    Note that this does not depends on number of documents in index, it
    depends on number of unique terms in index.

    This can be changed by setting higher value in
    indexWriter.setTermIndexInterval();
    Be aware of setting this value too hight because search performance will
    degrade.
    NOTE: this options is available only in Lucene 1.9.

    Also it can depend on number of fields in document, the way you process
    them (store, index, tokenize etc.)


    Leon Chaddock wrote:
    Hi All,

    We have a lucene index of over 10 000 000 docs at this time.
    When we try and run a search we get
    java.lang.OutOfMemoryError: Java heap space

    We have tried setting the xmx settings to 1gb but to no avail (the box
    has 4gb of memory available) . IS there any guidance on handling
    memory or has anyone had similar problems before that could help?

    Many thanks

    Leon

    ----- Original Message ----- From: "Pradeep Sharma"
    <pradeep@danicorp.com>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, February 01, 2006 2:03 AM
    Subject: Greetings and my first question - Is it a good practise to
    store application configuration in Lucene




    I have just joined this user group, but I probably will be asking
    questions / contributing for a while now as I am starting to work on a
    product which will use Lucene exclusively.

    Still in the designing phase, and I see that we need to manage several
    user / application specific configurations and I am exploring the idea
    of storing the configuration information also in the Index, may be
    create a separate index just for the configuration, because each
    module of the application will have access to Lucene classes.

    I know technically this can be done, but are there any best practises
    which discourage this?

    Thanks in advance.
    -Pradeep



    --------------------------------------------------------------------------------



    No virus found in this incoming message.
    Checked by AVG Free Edition.
    Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date:
    30/01/2006


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    regards,
    Volodymyr Bychkoviak


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at Feb 1, 2006 at 6:03 pm
    it seems like there are a few common things that bite people over and over
    again that you should check first and foremost...


    1) don't use more searchers/readers then you need.

    Every time you open an IndexSearcher/IndexReader resources are used which
    take up memory. for an application pointed at a static index, you only
    ever need one IndexReader/IndexSearcher that can be shared among multiple
    threads issuing queries. if your index is being incrimentally updated,
    you should never need more then two searcher/reader pairs open at a time
    -- one in use, and one that you open/warm up when you detect changes.
    swap it in for the "in use" instance when ready, and close the old "in
    use" instance as soon as all clients that were using it are done.

    2) close your resources when you are finished with them.

    The most common waste of memory i've seen is people who don't close
    instances of IndexSearcher or IndexReader when they are done with them.
    it's not enough to rely on them going out of scope and being garbage
    collected, you have to explictly close them to ensure that things like the
    CachingWrappingFilter and the FieldCache aren't caching large amounts of
    data for an IndexReader that can never be used again.

    A big part of this is making sure you know when your IndexSearcher is
    going to close your IndexReader for you -- read the javadocs carefully.

    3) don't sort on more fields then you can afford.

    Every time you sort on a field, a FieldCache array is constructed for that
    field. If you need to save some ram, and you currently let your clients
    sort on 30 different fields, try limiting their sort options -- those
    arrays can take up a lot of space.

    4) RangeQuery, PrefixQuery and WildCardQuery cost RAM

    if you use RangeQuery, PrefixQuery and WildCardQuery be prepared for them
    to eat up a lot of ram doing query expansion -- especially if you increase
    BooleanQuery.maxClauseCount to prevent TooManyClauses exceptions. the
    trade off you make by doing that is that now a prefix query like "f:a*"
    will expand into a boolean query containing every term in the field f that
    starts with an "a" ... if you've got a lot of terms, that can be a very
    big query, and it can take up a lot of RAM.

    Consider using ConstantScoreRangeQuery, etc.. instead.

    5) don't use field norms if you don't need them.

    This is only an option if you are using 1.9, and it's only a big issue if
    you have many indexed fields. FieledNorms take up one byte per doc per
    indexed field -- even if a doc doens't have a value for that field, it
    still gets a norm for that field. There are options when indexing to
    prevent norms from being calculated, which can save a lot of space.




    : Date: Wed, 1 Feb 2006 10:21:55 -0000
    : From: Leon Chaddock <leonchaddock@macranet.co.uk>
    : Reply-To: java-user@lucene.apache.org
    : To: java-user@lucene.apache.org
    : Subject: Memory problem
    :
    : Hi All,
    :
    : We have a lucene index of over 10 000 000 docs at this time.
    : When we try and run a search we get
    : java.lang.OutOfMemoryError: Java heap space
    :
    : We have tried setting the xmx settings to 1gb but to no avail (the box has
    : 4gb of memory available) . IS there any guidance on handling memory or has
    : anyone had similar problems before that could help?
    :
    : Many thanks
    :
    : Leon
    :
    : ----- Original Message -----
    : From: "Pradeep Sharma" <pradeep@danicorp.com>
    : To: <java-user@lucene.apache.org>
    : Sent: Wednesday, February 01, 2006 2:03 AM
    : Subject: Greetings and my first question - Is it a good practise to store
    : application configuration in Lucene
    :
    :
    :
    :
    : I have just joined this user group, but I probably will be asking questions
    : / contributing for a while now as I am starting to work on a product which
    : will use Lucene exclusively.
    :
    : Still in the designing phase, and I see that we need to manage several user
    : / application specific configurations and I am exploring the idea of storing
    : the configuration information also in the Index, may be create a separate
    : index just for the configuration, because each module of the application
    : will have access to Lucene classes.
    :
    : I know technically this can be done, but are there any best practises which
    : discourage this?
    :
    : Thanks in advance.
    : -Pradeep
    :
    :
    :
    : --------------------------------------------------------------------------------
    :
    :
    : No virus found in this incoming message.
    : Checked by AVG Free Edition.
    : Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date: 30/01/2006
    :
    :
    : ---------------------------------------------------------------------
    : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    : For additional commands, e-mail: java-user-help@lucene.apache.org
    :



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leon Chaddock at Feb 2, 2006 at 12:24 pm
    With reference to the below. We are plannig to have two indexes, one that
    indexes and optimizes, and a mirror index one that we query against.

    Once a day update the mirror index. Does this seem like a viable approach
    too people. We have a lot of data that is constantly updating so querying
    the index while optimizing just didnt seem to work?

    Thanks



    Every time you open an IndexSearcher/IndexReader resources are used which
    take up memory. for an application pointed at a static index, you only
    ever need one IndexReader/IndexSearcher that can be shared among multiple
    threads issuing queries. if your index is being incrimentally updated,
    you should never need more then two searcher/reader pairs open at a time
    -- one in use, and one that you open/warm up when you detect changes.
    swap it in for the "in use" instance when ready, and close the old "in
    use" instance as soon as all clients that were using it are done.

    ----- Original Message -----
    From: "Chris Hostetter" <hossman_lucene@fucit.org>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, February 01, 2006 6:03 PM
    Subject: Re: Memory problem

    it seems like there are a few common things that bite people over and over
    again that you should check first and foremost...


    1) don't use more searchers/readers then you need.

    Every time you open an IndexSearcher/IndexReader resources are used which
    take up memory. for an application pointed at a static index, you only
    ever need one IndexReader/IndexSearcher that can be shared among multiple
    threads issuing queries. if your index is being incrimentally updated,
    you should never need more then two searcher/reader pairs open at a time
    -- one in use, and one that you open/warm up when you detect changes.
    swap it in for the "in use" instance when ready, and close the old "in
    use" instance as soon as all clients that were using it are done.

    2) close your resources when you are finished with them.

    The most common waste of memory i've seen is people who don't close
    instances of IndexSearcher or IndexReader when they are done with them.
    it's not enough to rely on them going out of scope and being garbage
    collected, you have to explictly close them to ensure that things like the
    CachingWrappingFilter and the FieldCache aren't caching large amounts of
    data for an IndexReader that can never be used again.

    A big part of this is making sure you know when your IndexSearcher is
    going to close your IndexReader for you -- read the javadocs carefully.

    3) don't sort on more fields then you can afford.

    Every time you sort on a field, a FieldCache array is constructed for that
    field. If you need to save some ram, and you currently let your clients
    sort on 30 different fields, try limiting their sort options -- those
    arrays can take up a lot of space.

    4) RangeQuery, PrefixQuery and WildCardQuery cost RAM

    if you use RangeQuery, PrefixQuery and WildCardQuery be prepared for them
    to eat up a lot of ram doing query expansion -- especially if you increase
    BooleanQuery.maxClauseCount to prevent TooManyClauses exceptions. the
    trade off you make by doing that is that now a prefix query like "f:a*"
    will expand into a boolean query containing every term in the field f that
    starts with an "a" ... if you've got a lot of terms, that can be a very
    big query, and it can take up a lot of RAM.

    Consider using ConstantScoreRangeQuery, etc.. instead.

    5) don't use field norms if you don't need them.

    This is only an option if you are using 1.9, and it's only a big issue if
    you have many indexed fields. FieledNorms take up one byte per doc per
    indexed field -- even if a doc doens't have a value for that field, it
    still gets a norm for that field. There are options when indexing to
    prevent norms from being calculated, which can save a lot of space.




    : Date: Wed, 1 Feb 2006 10:21:55 -0000
    : From: Leon Chaddock <leonchaddock@macranet.co.uk>
    : Reply-To: java-user@lucene.apache.org
    : To: java-user@lucene.apache.org
    : Subject: Memory problem
    :
    : Hi All,
    :
    : We have a lucene index of over 10 000 000 docs at this time.
    : When we try and run a search we get
    : java.lang.OutOfMemoryError: Java heap space
    :
    : We have tried setting the xmx settings to 1gb but to no avail (the box
    has
    : 4gb of memory available) . IS there any guidance on handling memory or
    has
    : anyone had similar problems before that could help?
    :
    : Many thanks
    :
    : Leon
    :
    : ----- Original Message -----
    : From: "Pradeep Sharma" <pradeep@danicorp.com>
    : To: <java-user@lucene.apache.org>
    : Sent: Wednesday, February 01, 2006 2:03 AM
    : Subject: Greetings and my first question - Is it a good practise to
    store
    : application configuration in Lucene
    :
    :
    :
    :
    : I have just joined this user group, but I probably will be asking
    questions
    : / contributing for a while now as I am starting to work on a product
    which
    : will use Lucene exclusively.
    :
    : Still in the designing phase, and I see that we need to manage several
    user
    : / application specific configurations and I am exploring the idea of
    storing
    : the configuration information also in the Index, may be create a
    separate
    : index just for the configuration, because each module of the application
    : will have access to Lucene classes.
    :
    : I know technically this can be done, but are there any best practises
    which
    : discourage this?
    :
    : Thanks in advance.
    : -Pradeep
    :
    :
    :
    : --------------------------------------------------------------------------------
    :
    :
    : No virus found in this incoming message.
    : Checked by AVG Free Edition.
    : Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date:
    30/01/2006
    :
    :
    : ---------------------------------------------------------------------
    : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    : For additional commands, e-mail: java-user-help@lucene.apache.org
    :



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org





    --
    No virus found in this incoming message.
    Checked by AVG Free Edition.
    Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date:
    30/01/2006

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Nick Vincent at Feb 1, 2006 at 11:18 am
    Hi Leon,

    I had a similar problem when doing a test import which I believe was actually down to object churn in parsing the data to create the Documents. I achieved a quick fix by calling System.gc() every thousand documents.

    Cheers,

    Nick

    ________________________________

    From: Leon Chaddock
    Sent: Wed 01/02/2006 10:21
    To: java-user@lucene.apache.org
    Subject: Memory problem



    Hi All,

    We have a lucene index of over 10 000 000 docs at this time.
    When we try and run a search we get
    java.lang.OutOfMemoryError: Java heap space

    We have tried setting the xmx settings to 1gb but to no avail (the box has
    4gb of memory available) . IS there any guidance on handling memory or has
    anyone had similar problems before that could help?

    Many thanks

    Leon

    ----- Original Message -----
    From: "Pradeep Sharma" <pradeep@danicorp.com>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, February 01, 2006 2:03 AM
    Subject: Greetings and my first question - Is it a good practise to store
    application configuration in Lucene




    I have just joined this user group, but I probably will be asking questions
    / contributing for a while now as I am starting to work on a product which
    will use Lucene exclusively.

    Still in the designing phase, and I see that we need to manage several user
    / application specific configurations and I am exploring the idea of storing
    the configuration information also in the Index, may be create a separate
    index just for the configuration, because each module of the application
    will have access to Lucene classes.

    I know technically this can be done, but are there any best practises which
    discourage this?

    Thanks in advance.
    -Pradeep



    --------------------------------------------------------------------------------


    No virus found in this incoming message.
    Checked by AVG Free Edition.
    Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date: 30/01/2006


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Leon Chaddock at Feb 1, 2006 at 11:23 am
    Hi Nick,
    we didnt get the error on importing it was actually when conducting a
    search. Would this still help?
    Thanks
    Leon
    ----- Original Message -----
    From: "Nick Vincent" <nick@neoworks.com>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, February 01, 2006 11:17 AM
    Subject: RE: Memory problem


    Hi Leon,

    I had a similar problem when doing a test import which I believe was
    actually down to object churn in parsing the data to create the Documents.
    I achieved a quick fix by calling System.gc() every thousand documents.

    Cheers,

    Nick

    ________________________________

    From: Leon Chaddock
    Sent: Wed 01/02/2006 10:21
    To: java-user@lucene.apache.org
    Subject: Memory problem



    Hi All,

    We have a lucene index of over 10 000 000 docs at this time.
    When we try and run a search we get
    java.lang.OutOfMemoryError: Java heap space

    We have tried setting the xmx settings to 1gb but to no avail (the box has
    4gb of memory available) . IS there any guidance on handling memory or has
    anyone had similar problems before that could help?

    Many thanks

    Leon

    ----- Original Message -----
    From: "Pradeep Sharma" <pradeep@danicorp.com>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, February 01, 2006 2:03 AM
    Subject: Greetings and my first question - Is it a good practise to store
    application configuration in Lucene




    I have just joined this user group, but I probably will be asking questions
    / contributing for a while now as I am starting to work on a product which
    will use Lucene exclusively.

    Still in the designing phase, and I see that we need to manage several user
    / application specific configurations and I am exploring the idea of storing
    the configuration information also in the Index, may be create a separate
    index just for the configuration, because each module of the application
    will have access to Lucene classes.

    I know technically this can be done, but are there any best practises which
    discourage this?

    Thanks in advance.
    -Pradeep



    --------------------------------------------------------------------------------


    No virus found in this incoming message.
    Checked by AVG Free Edition.
    Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date: 30/01/2006


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org







    --------------------------------------------------------------------------------

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --------------------------------------------------------------------------------


    No virus found in this incoming message.
    Checked by AVG Free Edition.
    Version: 7.1.375 / Virus Database: 267.14.25/246 - Release Date: 30/01/2006


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 1, '06 at 2:04a
activeFeb 2, '06 at 12:24p
posts8
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase