Just a quick question regarding using the Optimize function in Lucene.NET.

Is it more time efficient to call Optimize occasionally while adding
documents to an index, or is it better to call it at the end of adding
documents only?

The index we are creating has a possible 2-3 million records added at a
time and we currently optimize every 100,000.

Thanks in advance.

Trevor Watson

Search Discussions

  • Kevin Miller at Jan 24, 2011 at 7:26 pm
    I battled with How To Optimize Lucene a bit. Make sure you understand why
    you are calling Optimize. You only need to optimize to improve search
    performance (by limiting the number of index files which get traversed when
    collecting hits). The index writer will "optimize" automatically based on
    your MergeFactor settings. IMHO your in-flight optimizations are likely
    unnecessary. I also found that it was best to use the Optimize(int) overload
    which to select the minimum number of index files (sorry I forgot the
    correct term) which sped up optimization quite a lot as getting a large
    index into a single index file can be quite time consuming.

    I found this post very educational regarding this subject:
    http://tim.oreilly.com/pub/a/onjava/2003/03/05/lucene.html?page=1

    On Mon, Jan 24, 2011 at 12:58 PM, Trevor Watson
    wrote:
    Just a quick question regarding using the Optimize function in Lucene.NET.

    Is it more time efficient to call Optimize occasionally while adding
    documents to an index, or is it better to call it at the end of adding
    documents only?

    The index we are creating has a possible 2-3 million records added at a
    time and we currently optimize every 100,000.

    Thanks in advance.

    Trevor Watson
  • Trevor Watson at Jan 24, 2011 at 8:20 pm
    Thanks to all for the speedy and useful replies!
    On 01/24/2011 2:20 PM, Kevin Miller wrote:
    I battled with How To Optimize Lucene a bit. Make sure you understand why
    you are calling Optimize. You only need to optimize to improve search
    performance (by limiting the number of index files which get traversed when
    collecting hits). The index writer will "optimize" automatically based on
    your MergeFactor settings. IMHO your in-flight optimizations are likely
    unnecessary. I also found that it was best to use the Optimize(int) overload
    which to select the minimum number of index files (sorry I forgot the
    correct term) which sped up optimization quite a lot as getting a large
    index into a single index file can be quite time consuming.

    I found this post very educational regarding this subject:
    http://tim.oreilly.com/pub/a/onjava/2003/03/05/lucene.html?page=1

    On Mon, Jan 24, 2011 at 12:58 PM, Trevor Watson
    wrote:
    Just a quick question regarding using the Optimize function in Lucene.NET.

    Is it more time efficient to call Optimize occasionally while adding
    documents to an index, or is it better to call it at the end of adding
    documents only?

    The index we are creating has a possible 2-3 million records added at a
    time and we currently optimize every 100,000.

    Thanks in advance.

    Trevor Watson
  • Jean-Francois Beaulac at Jan 24, 2011 at 7:40 pm
    From the Lucene FAQ:
    http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_index_optimization_and_when_should_I_use_it.3F
    jf
    Date: Mon, 24 Jan 2011 13:58:20 -0500
    From: twatson@datassimilate.com
    To: lucene-net-user@lucene.apache.org
    Subject: Optimization times

    Just a quick question regarding using the Optimize function in Lucene.NET.

    Is it more time efficient to call Optimize occasionally while adding
    documents to an index, or is it better to call it at the end of adding
    documents only?

    The index we are creating has a possible 2-3 million records added at a
    time and we currently optimize every 100,000.

    Thanks in advance.

    Trevor Watson
  • Digy at Jan 24, 2011 at 7:44 pm
    With 2.9.2 you don't have to optimize at all.
    DIGY

    -----Original Message-----
    From: Trevor Watson
    Sent: Monday, January 24, 2011 8:58 PM
    To: lucene-net-user@lucene.apache.org
    Subject: Optimization times

    Just a quick question regarding using the Optimize function in Lucene.NET.

    Is it more time efficient to call Optimize occasionally while adding
    documents to an index, or is it better to call it at the end of adding
    documents only?

    The index we are creating has a possible 2-3 million records added at a
    time and we currently optimize every 100,000.

    Thanks in advance.

    Trevor Watson
  • K a r n a v at Jan 25, 2011 at 4:41 am
    how can I partially update the index files....
    ..
    I mean partial indexing logic required for me...
    could anyone please help me...

    On Tue, Jan 25, 2011 at 12:28 AM, Trevor Watson
    wrote:
    Just a quick question regarding using the Optimize function in Lucene.NET.

    Is it more time efficient to call Optimize occasionally while adding
    documents to an index, or is it better to call it at the end of adding
    documents only?

    The index we are creating has a possible 2-3 million records added at a
    time and we currently optimize every 100,000.

    Thanks in advance.

    Trevor Watson


    --
    *Thanks & Regards*,
    *Karunaker Reddy V
    *
  • Frank Yu at Jan 28, 2011 at 4:16 am
    Karnav,

    It would be better for you to open a new thread for your own question. I am
    not sure if I got what you meant by partially updating the index files.

    Thanks,

    Frank

    -----Original Message-----
    From: K a r n a v
    Sent: Monday, January 24, 2011 9:41 PM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: Optimization times

    how can I partially update the index files....
    ..
    I mean partial indexing logic required for me...
    could anyone please help me...

    On Tue, Jan 25, 2011 at 12:28 AM, Trevor Watson
    wrote:
    Just a quick question regarding using the Optimize function in Lucene.NET.

    Is it more time efficient to call Optimize occasionally while adding
    documents to an index, or is it better to call it at the end of adding
    documents only?

    The index we are creating has a possible 2-3 million records added at a
    time and we currently optimize every 100,000.

    Thanks in advance.

    Trevor Watson


    --
    *Thanks & Regards*,
    *Karunaker Reddy V
    *

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedJan 24, '11 at 7:10p
activeJan 28, '11 at 4:16a
posts7
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase