FAQ
Hello,

In my application, I need to flush data each time a modification is made. So
each time an entry is added in the lucene index we call IndexWriter.flush
this way all data are secured on file system.

We noticed that this operation is more and more time consuming while the
size of the index raises.

How can I tune the index to have a duration constant for the flush operation
?

Before changing our logic for example flushing each "n" entries, I'd like to
be sure that there isn't something I miss.

I saw that the IndexWriter API changed in last version but our version is
2.3.1 that don't have for example the commit operation (and the flush method
is deprecated).

Thank you for your suggestions.
--
View this message in context: http://www.nabble.com/IndexWriter.flush-performance-tp20880541p20880541.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Jokin Cuadrado at Dec 7, 2008 at 1:28 pm
    Avery time you flush the index, you are writing a small index to the
    disk. Theres a defined value (mergefactor) that decides when it have
    to merge all of those small index in a bigger one, so as the index
    grown the merges are bigger. First you merge 10 indexes of 1
    document, then 10 indexes of 10 documents, then 10 indexes of 100
    documents and so on. The flush (that becomes commit) it's a costly
    operation, so i suggest to refactor your logic so you can flush every
    n documentes. However, you can change the mergepolicy and do the
    merges in background, so the flush operation exit quicky, but be aware
    that the index is left working behind, so the performance will we very
    bad.
    On Sun, Dec 7, 2008 at 1:27 PM, mimounl wrote:

    Hello,

    In my application, I need to flush data each time a modification is made. So
    each time an entry is added in the lucene index we call IndexWriter.flush
    this way all data are secured on file system.

    We noticed that this operation is more and more time consuming while the
    size of the index raises.

    How can I tune the index to have a duration constant for the flush operation
    ?

    Before changing our logic for example flushing each "n" entries, I'd like to
    be sure that there isn't something I miss.

    I saw that the IndexWriter API changed in last version but our version is
    2.3.1 that don't have for example the commit operation (and the flush method
    is deprecated).

    Thank you for your suggestions.
    --
    View this message in context: http://www.nabble.com/IndexWriter.flush-performance-tp20880541p20880541.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mimounl at Dec 8, 2008 at 12:10 am

    Jokin Cuadrado wrote:

    Avery time you flush the index, you are writing a small index to the
    disk. Theres a defined value (mergefactor) that decides when it have
    to merge all of those small index in a bigger one, so as the index
    grown the merges are bigger.
    Don't you thing I have to migrate my lucene version to 1.4 because in this
    version, it sounds like the writings of document in the index files are
    independant from the merge operation ?
    I mean, in last version, the merge is performed by default by a
    ConcurrentMergeScheduler that will make the commit operation approximatly
    constant whatever the size of the index. Is that true ?
    --
    View this message in context: http://www.nabble.com/IndexWriter.flush-performance-tp20880541p20887656.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Dec 8, 2008 at 11:25 am
    Flushing is still done "synchronously" with an addDocument call. The
    time spent is in proportion to how large the RAM buffer is, and, how
    fast your IO system accepts writes.

    So, you'll be happily adding documents, until IW decides a flush is
    needed, and then it will flush (blocking) using your current thread.

    But, as you noted, previously that flush would also synchronously
    merge when needed, but with ConcurrentMergeScheduler that merging is
    now done in the background.

    The new commit() method is quite a bit more costly than a flush
    because it must sync the files (ensure they are persisted to stable
    storage) before continuing.

    There is a nice analogy to mountain climbing: every so often, you must
    hammer a new anchor into the rock, which is your safety in case you
    fall. You spend alot of time finding a safe spot, and hammering
    thoroughly, so that anchor will hold you if you fall, just as Lucene's
    commit spends alot of time waiting for all the "anchors" to be on
    stable storage in case the machine crashes. In between hammering
    anchors you can climb fairly quickly simply using hands & feet to
    "temporarily" hold on, just like Lucene writes new segment files as
    "temporary" files (in that they won't survive crash), during flush.
    So you should use commit sparingly, and, open your IndexWriter with
    autoCommit=false.

    Mike

    mimounl wrote:


    Jokin Cuadrado wrote:
    Avery time you flush the index, you are writing a small index to the
    disk. Theres a defined value (mergefactor) that decides when it have
    to merge all of those small index in a bigger one, so as the index
    grown the merges are bigger.
    Don't you thing I have to migrate my lucene version to 1.4 because
    in this
    version, it sounds like the writings of document in the index files
    are
    independant from the merge operation ?
    I mean, in last version, the merge is performed by default by a
    ConcurrentMergeScheduler that will make the commit operation
    approximatly
    constant whatever the size of the index. Is that true ?
    --
    View this message in context: http://www.nabble.com/IndexWriter.flush-performance-tp20880541p20887656.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Laurent Mimoun at Dec 8, 2008 at 5:43 pm

    Michael McCandless-2 wrote:


    So you should use commit sparingly, and, open your IndexWriter with
    autoCommit=false.
    Thank you for your respsonse.

    But I would be estonished that no code is provided in lucene API to do the
    job of commiting regularly modifications : do I really have to code this by
    hand ?

    --
    View this message in context: http://www.nabble.com/IndexWriter.flush-performance-tp20880541p20899987.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Dec 8, 2008 at 5:46 pm
    IndexWriter.close() does a commit.

    Otherwise you will (in 3.0) need to do it by hand.

    Mike

    Laurent Mimoun wrote:

    Michael McCandless-2 wrote:

    So you should use commit sparingly, and, open your IndexWriter with
    autoCommit=false.
    Thank you for your respsonse.

    But I would be estonished that no code is provided in lucene API to
    do the
    job of committing regularly modifications : do I really have to code
    this by
    hand ?

    --
    View this message in context: http://www.nabble.com/IndexWriter.flush-performance-tp20880541p20899987.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 7, '08 at 12:28p
activeDec 8, '08 at 5:46p
posts6
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase