FAQ
Hi,

I know Lucene does not have transaction support at this stage.
However, I want to know what will happen if there is an operating
system crash during the indexing process, will the Lucene index got
corrupted?

Thanks,

Jian

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Otis Gospodnetic at Jul 16, 2005 at 7:32 am
    The only corruption that I've seen mentioned on this list so far was
    the corruption of the segments file, and even that people have been
    able to manually edit with a hex editor.

    Otis


    --- jian chen wrote:
    Hi,

    I know Lucene does not have transaction support at this stage.
    However, I want to know what will happen if there is an operating
    system crash during the indexing process, will the Lucene index got
    corrupted?

    Thanks,

    Jian

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jian chen at Jul 16, 2005 at 8:27 pm
    Hi, Otis,

    Thanks for your email. As this is very important for using Lucene in
    our production system, I looked at the code to try to understand. Here
    is my observation why the index won't be corrupted during a system
    crash.

    In the IndexWriter.java mergeSegments(...) method, there are two lines:
    segmentInfos.write(directory); // commit before deleting
    deleteSegments(segmentsToDelete);//delete unused segments

    The sgementInfos.write(...) writes the new segments file as
    "segments.new", once the write is complete, it renames "segments.new"
    to "segments".

    I guess the rename operation is atomic as guaranteed by the operating
    system. Otherwise, the "segments" file will be left in an inconsistent
    state during the system crash.

    It also appears to me that the "segments" file is the single point to
    switch from old set of index segments to new ones. In case of a system
    failure, the old "segments" file will be used anyway, so, no
    corruption.

    Is this understanding correct and thorough?

    Thanks a lot,

    Jian
    On 7/16/05, Otis Gospodnetic wrote:
    The only corruption that I've seen mentioned on this list so far was
    the corruption of the segments file, and even that people have been
    able to manually edit with a hex editor.

    Otis


    --- jian chen wrote:
    Hi,

    I know Lucene does not have transaction support at this stage.
    However, I want to know what will happen if there is an operating
    system crash during the indexing process, will the Lucene index got
    corrupted?

    Thanks,

    Jian

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Nikhil Goel at Jul 16, 2005 at 9:59 pm
    hey Jian,

    Thats a very good thread to start and we faced the similar situation
    in our production system where Lunce Index got actually corrupted coz
    of non-atomiticity of wrting the index.

    Your observation is correct and the only problem that could happen is
    there will be zombie segments in your index since they dont get listed
    in segments file before the system crash. But i am giving one warning
    here, we have seen a case where somehow one segment file entry (.fdx
    file) entry is there in "Segments" file but that .fdx file has a size
    of 200 but in actual there was nothing in the file and hence we get
    past EOF. After lots of inspection, we still couldnt figure out why
    that happened. I tried to post that query to this newsgroup but
    unfortunately i got no reply and it made us to stop indexing for a
    while.

    The approach we are following now is to write index in Database and
    doing it in a transaction and hence we commit the transaction only
    when the segments file and delete file gets updated otherwise we
    rollback. This solution has been working well for us but its giving a
    slow performance but better than losing the entire index.

    I will be glad if someone can give better reasoning abt corruption. I
    have seen lots of posts on this group abt it but no one really
    responds to this important question.

    Please let me know if you have something more to add to my explanation.
    Thanks.
    Nikhil

    On 7/16/05, jian chen wrote:
    Hi, Otis,

    Thanks for your email. As this is very important for using Lucene in
    our production system, I looked at the code to try to understand. Here
    is my observation why the index won't be corrupted during a system
    crash.

    In the IndexWriter.java mergeSegments(...) method, there are two lines:
    segmentInfos.write(directory); // commit before deleting
    deleteSegments(segmentsToDelete);//delete unused segments

    The sgementInfos.write(...) writes the new segments file as
    "segments.new", once the write is complete, it renames "segments.new"
    to "segments".

    I guess the rename operation is atomic as guaranteed by the operating
    system. Otherwise, the "segments" file will be left in an inconsistent
    state during the system crash.

    It also appears to me that the "segments" file is the single point to
    switch from old set of index segments to new ones. In case of a system
    failure, the old "segments" file will be used anyway, so, no
    corruption.

    Is this understanding correct and thorough?

    Thanks a lot,

    Jian
    On 7/16/05, Otis Gospodnetic wrote:
    The only corruption that I've seen mentioned on this list so far was
    the corruption of the segments file, and even that people have been
    able to manually edit with a hex editor.

    Otis


    --- jian chen wrote:
    Hi,

    I know Lucene does not have transaction support at this stage.
    However, I want to know what will happen if there is an operating
    system crash during the indexing process, will the Lucene index got
    corrupted?

    Thanks,

    Jian

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at Jul 16, 2005 at 10:22 pm
    Hi Jian,

    Yes, I think what you describes is correct. You may end up with some
    "junk index segments" in the index directory, but as long as they are
    not recorded in segments file, they are irrelevant.

    Otis
    P.S.
    Did you ask you locking in Lucene the other day?


    --- jian chen wrote:
    Hi, Otis,

    Thanks for your email. As this is very important for using Lucene in
    our production system, I looked at the code to try to understand.
    Here
    is my observation why the index won't be corrupted during a system
    crash.

    In the IndexWriter.java mergeSegments(...) method, there are two
    lines:
    segmentInfos.write(directory); // commit before deleting
    deleteSegments(segmentsToDelete);//delete unused segments

    The sgementInfos.write(...) writes the new segments file as
    "segments.new", once the write is complete, it renames "segments.new"
    to "segments".

    I guess the rename operation is atomic as guaranteed by the operating
    system. Otherwise, the "segments" file will be left in an
    inconsistent
    state during the system crash.

    It also appears to me that the "segments" file is the single point to
    switch from old set of index segments to new ones. In case of a
    system
    failure, the old "segments" file will be used anyway, so, no
    corruption.

    Is this understanding correct and thorough?

    Thanks a lot,

    Jian
    On 7/16/05, Otis Gospodnetic wrote:
    The only corruption that I've seen mentioned on this list so far was
    the corruption of the segments file, and even that people have been
    able to manually edit with a hex editor.

    Otis


    --- jian chen wrote:
    Hi,

    I know Lucene does not have transaction support at this stage.
    However, I want to know what will happen if there is an operating
    system crash during the indexing process, will the Lucene index
    got
    corrupted?

    Thanks,

    Jian
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jian chen at Jul 17, 2005 at 12:10 am
    Thanks Otis and Nikhil for your confirmation. I am more confident
    about the Lucene index integrity.

    Nikhil, I don't see th reason why there is a corrupted .fdx file.
    Could it be caused by multi-threaded access to the index?

    Otis, I don't remember I asked about locking questions the other day.
    I think it should be another guy.

    Thanks all,

    Jian
    On 7/16/05, Otis Gospodnetic wrote:
    Hi Jian,

    Yes, I think what you describes is correct. You may end up with some
    "junk index segments" in the index directory, but as long as they are
    not recorded in segments file, they are irrelevant.

    Otis
    P.S.
    Did you ask you locking in Lucene the other day?


    --- jian chen wrote:
    Hi, Otis,

    Thanks for your email. As this is very important for using Lucene in
    our production system, I looked at the code to try to understand.
    Here
    is my observation why the index won't be corrupted during a system
    crash.

    In the IndexWriter.java mergeSegments(...) method, there are two
    lines:
    segmentInfos.write(directory); // commit before deleting
    deleteSegments(segmentsToDelete);//delete unused segments

    The sgementInfos.write(...) writes the new segments file as
    "segments.new", once the write is complete, it renames "segments.new"
    to "segments".

    I guess the rename operation is atomic as guaranteed by the operating
    system. Otherwise, the "segments" file will be left in an
    inconsistent
    state during the system crash.

    It also appears to me that the "segments" file is the single point to
    switch from old set of index segments to new ones. In case of a
    system
    failure, the old "segments" file will be used anyway, so, no
    corruption.

    Is this understanding correct and thorough?

    Thanks a lot,

    Jian
    On 7/16/05, Otis Gospodnetic wrote:
    The only corruption that I've seen mentioned on this list so far was
    the corruption of the segments file, and even that people have been
    able to manually edit with a hex editor.

    Otis


    --- jian chen wrote:
    Hi,

    I know Lucene does not have transaction support at this stage.
    However, I want to know what will happen if there is an operating
    system crash during the indexing process, will the Lucene index
    got
    corrupted?

    Thanks,

    Jian
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Harini Raghavan at Jul 17, 2005 at 2:13 pm
    Hi All,
    I am quite new to Lucene and I have problem with locking. I have a
    MessageDrivenBean that sends messages to my Lucene indexer whenever there is
    a new database update. The indexer updates the index incrementally . Below
    is the code fragment in the indexer method that gets invoked by the MDB
    listener.

    public void addDocument(Document doc) {
    String indexLoc = luceneConfig.getIndexDir();
    IndexWriter writer = getIndexWriter(indexLoc, false);
    try{
    writer.addDocument(doc);
    } catch(IOException e) {
    logger.error("IOException occurred in addDocument()");
    } catch(Exception e) {
    logger.error("Exception occurred in addDocument()");
    } finally {
    try {
    writer.close();
    } catch(IOException e){
    }
    }

    The incremental update works fine twice and the third time it throws the
    following exception :

    java.io.IOException: Index locked for write: Lock@C:\tmpIndex\write.lock
    at org.apache.lucene.index.IndexWriter.<init>(Unknown Source)
    at org.apache.lucene.index.IndexWriter.<init>(Unknown Source)
    at lucene.LuceneActions.getIndexWriter(LuceneActions.java:151)
    at lucene.LuceneActions.addDocument(LuceneActions.java:43)
    at index.IndexServiceImpl.addData(IndexServiceImpl.java:63)
    at index.IndexServiceImpl.addToIndex(IndexServiceImpl.java:28)

    The Index Writer is created every time and also closed in the finally block.
    Should I be doing something else?
    Any help would be appreciated.
    Thanks,
    Harini


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 16, '05 at 12:42a
activeJul 17, '05 at 2:13p
posts7
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase