FAQ
Hi all,


I know Lucene indexes to be at their optimum up to a certain size - said
to be around several GBs. I haven't found a good discussion over this,
but its my understanding that at some point its better to split an index
into parts (a la sharding) than to continue searching on a huge-size
index. I assume this has to do with OS and IO configurations. Can anyone
point me to more info on this?


We have a product that is using Lucene for various searches, and at the
moment each type of search is using its own Lucene index. We plan on
refactoring the way it works and to combine all indexes into one -
making the whole system more robust and with a smaller memory footprint,
among other things.


Assuming the above is true, we are interested in knowing how to do this
correctly. Initially all our indexes will be run in one big index, but
if at some index size there is a severe performance degradation we would
like to handle that correctly by starting a new FSDirectory index to
flush into, or by re-indexing and moving large indexes into their own
Lucene index.

Are there are any guidelines for measuring or estimating this correctly?
what we should be aware of while considering all that? We can't assume
anything about the machine running it, so testing won't really tell us
much...

Thanks in advance for any input on this,

Itamar.

Search Discussions

  • Itamar Syn-Hershko at Jun 11, 2011 at 7:39 pm
    Sorry, I intended to post it to java-user. Did so now...
    On 11/06/2011 22:36, Itamar Syn-Hershko wrote:

    Hi all,


    I know Lucene indexes to be at their optimum up to a certain size -
    said to be around several GBs. I haven't found a good discussion over
    this, but its my understanding that at some point its better to split
    an index into parts (a la sharding) than to continue searching on a
    huge-size index. I assume this has to do with OS and IO
    configurations. Can anyone point me to more info on this?


    We have a product that is using Lucene for various searches, and at
    the moment each type of search is using its own Lucene index. We plan
    on refactoring the way it works and to combine all indexes into one -
    making the whole system more robust and with a smaller memory
    footprint, among other things.


    Assuming the above is true, we are interested in knowing how to do
    this correctly. Initially all our indexes will be run in one big
    index, but if at some index size there is a severe performance
    degradation we would like to handle that correctly by starting a new
    FSDirectory index to flush into, or by re-indexing and moving large
    indexes into their own Lucene index.

    Are there are any guidelines for measuring or estimating this
    correctly? what we should be aware of while considering all that? We
    can't assume anything about the machine running it, so testing won't
    really tell us much...

    Thanks in advance for any input on this,

    Itamar.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedJun 11, '11 at 7:37p
activeJun 11, '11 at 7:39p
posts2
users1
websitelucene.apache.org

1 user in discussion

Itamar Syn-Hershko: 2 posts

People

Translate

site design / logo © 2022 Grokbase