FAQ
Hi,

I'm rephrasing a previous performance question, in light of new data...
I have a Lucene index of about 0.5 GB.
Currently performance is good - up to 200 milliseconds per search (with
complex boolean queries, but never retrieving more than 200 top results).

The question: how much can the index grow, before there's noticeable
performance degradation?

1) Does anyone please have production experience with, say, 5 GB index? 10
GB?
If so, are there recommendations about merge policy, file size
configuration, etc?
If it degrades, I have other solutions (involving a change in logic), but I
don't want to get into it unless necessary.

2) Also, about 5% of my documents are editable (= the application
occasionally deletes them, and adds a modified document instead).
The other 90% are "immutable" (never deleted/edited).
Can Lucene take advantage of this? E.g. will it be smart enough to keep
changes in a single small file (which needs to be optimized), while the
other files remain unchanged?

Thanks :)

Search Discussions

  • Erick Erickson at Mar 31, 2011 at 12:41 pm
    5-10 G indexes are pretty small by Lucene/Solr standards, so
    given reasonable hardware resources this should be no problem.
    That said, only measurement will nail this down. But an
    often-used rule of thumb is that you need to consider some
    better strategies in the 40G range.

    CAUTION: you haven't specified what hardware you're running on,
    64/32 bit? Memory available? Other things running on your
    machine?

    You might want to review:
    http://wiki.apache.org/solr/UsingMailingLists

    As to your second question, why do you care? If you simply
    optimize every so often (say daily or weekly) you'll reclaim all
    the space anyway and avoid operational/programmatic
    complexity. I'd venture that you won't notice any performance
    difference unless you're changing your documents at a
    furious rate.

    Best
    Erick
    On Thu, Mar 31, 2011 at 8:14 AM, sol myr wrote:
    Hi,

    I'm rephrasing a previous performance question, in light of new data...
    I have a Lucene index of about 0.5 GB.
    Currently performance is good - up to 200 milliseconds per search (with
    complex boolean queries, but never retrieving more than 200 top results).

    The question: how much can the index grow, before there's noticeable
    performance degradation?

    1) Does anyone please have production experience with, say, 5 GB index? 10
    GB?
    If so, are there recommendations about merge policy, file size
    configuration, etc?
    If it degrades, I have other solutions (involving a change in logic), but I
    don't want to get into it unless necessary.

    2) Also, about 5% of my documents are editable (= the application
    occasionally deletes them, and adds a modified document instead).
    The other 90% are "immutable" (never deleted/edited).
    Can Lucene take advantage of this? E.g. will it be smart enough to keep
    changes in a single small file (which needs to be optimized), while the
    other files remain unchanged?

    Thanks :)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 31, '11 at 12:14p
activeMar 31, '11 at 12:41p
posts2
users2
websitelucene.apache.org

2 users in discussion

Sol myr: 1 post Erick Erickson: 1 post

People

Translate

site design / logo © 2022 Grokbase