FAQ
Hi,

I'm using Lucene 2.4.1 and am seeing occasional index corruption. It shows
up when I call MultiSearcher.search(). MultiSearcher.search() throws the
following exception:

ArrayIndexOutOfBoundsException. The error is: Array index out of range: ###
where ### is a number representing an index into the deletedDocs BitVector
in SegmentTermDocs. the stack trace is as follows:

2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.util.BitVector.get(BitVector.java:91)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.next(MultiSegmentReader.java:554)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:384)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:415)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:206)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:167)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:55)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollector.java:43)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:232)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
org.apache.lucene.search.Searcher.search(Searcher.java:86)
2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
com.acquiremedia.opens.index.searcher.HybridIndexSearcher.search(HybridIndexSearcher.java:311)
.
.
.

That makes sense, but I'm trying to understand what could be causing the
corruption.

Here's what I'm doing:

1) I have an IndexWriter created usnig a RAMDirectory.

2) I have a single thread processing index adds and index deletes. This
thread is rather simple and calls IndexWriter.addDocument() followed by
IndexWriter.commit() or IndexWriter.deleteDocuments() followed by
IndexWriter.commit(). The commits are done at this point because we want
the documents available for searching immediately.

3) I also have 4 search threads running at the same time as the index write
thread. Each time a search thread executes a search it calls
IndexReader.reopen() on the existing IndexReader for the index created
using the RAMDirectory above, gets an existing index reader for another
on-disk index and then calls MultiSearcher.search() (this brings together
the RamDirectory index and an on-disk index) to execute the search.

It generally works fine, but every few days or so I get the above
ArrayIndexOutOfBoundsException. My suspicion is that when the
IndexWritert.commit() call happens due to a delete at the exact same time as
the IndexReader.reopen() call happens, the IndexReader has a deleteDocs
BitVector which reflects the delete, but something else internal to the
IndexReader is not reflecting the delete.

So, I implemented a semaphore mechanism to prevent IndexWriter.commit() from
happening at the same time as IndexReader.reopen(). I only implemented the
semaphores in the delete case because my guess was that an add would be
unlikely to affect the deleteDocs Bit Vector. Unfortunately, the problem
continues to happen.

I believe I read somewhere that a similar thread saftey issue had been fixed
in Lucene 2.4, but I suspect there may still be an issue in 2.4.1.

Does anyone have any knowledge or insight into what I may be doing wrong or
how I can correct/avoid the problem? Upgrading to Lucene 3.0 or using Solr
are not really options for me at least in the short term.

Thanks for any information you can provide!

Frank
--
View this message in context: http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27115515.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Michael McCandless at Jan 12, 2010 at 9:01 am
    Not good! Can you post the ###'s from the exception? How far out of
    bounds is the access?

    Your usage sounds fine. Reopen during commit is fine.

    Are you sure the exception comes from the reader on the RAMDir and not
    your main dir?

    How do you periodically move your in-RAM changes to the disk index?

    Have you run CheckIndex on your index? Also, try running with asserts
    enabled... it may catch whatever is happening, sooner.

    Do you hold open a single IW on the RAMDir, and re-use it? Or open
    and then close? What about the disk dir?

    Mike
    On Mon, Jan 11, 2010 at 1:42 PM, Frank Geary wrote:

    Hi,

    I'm using Lucene 2.4.1 and am seeing occasional index corruption.  It shows
    up when I call MultiSearcher.search().  MultiSearcher.search() throws the
    following exception:

    ArrayIndexOutOfBoundsException.  The error is: Array index out of range: ###
    where ### is a number representing an index into the deletedDocs BitVector
    in SegmentTermDocs.  the stack trace is as follows:

    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.util.BitVector.get(BitVector.java:91)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.next(MultiSegmentReader.java:554)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:384)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:415)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:206)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:167)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:55)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollector.java:43)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:232)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.Searcher.search(Searcher.java:86)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    com.acquiremedia.opens.index.searcher.HybridIndexSearcher.search(HybridIndexSearcher.java:311)
    .
    .
    .

    That makes sense, but I'm trying to understand what could be causing the
    corruption.

    Here's what I'm doing:

    1) I have an IndexWriter created usnig a RAMDirectory.

    2) I have a single thread processing index adds and index deletes.  This
    thread is rather simple and calls IndexWriter.addDocument() followed by
    IndexWriter.commit() or IndexWriter.deleteDocuments() followed by
    IndexWriter.commit().  The commits are done at this point because we want
    the documents available for searching immediately.

    3) I also have 4 search threads running at the same time as the index write
    thread.  Each time a search thread executes a search it calls
    IndexReader.reopen()  on the existing IndexReader for the index created
    using the RAMDirectory above, gets an existing index reader for another
    on-disk index and then calls MultiSearcher.search() (this brings together
    the RamDirectory index and an on-disk index) to execute the search.

    It generally works fine, but every few days or so I get the above
    ArrayIndexOutOfBoundsException.   My suspicion is that when the
    IndexWritert.commit() call happens due to a delete at the exact same time as
    the IndexReader.reopen() call happens, the IndexReader has a deleteDocs
    BitVector which reflects the delete, but something else internal to the
    IndexReader is not reflecting the delete.

    So, I implemented a semaphore mechanism to prevent IndexWriter.commit() from
    happening at the same time as IndexReader.reopen().  I only implemented the
    semaphores in the delete case because my guess was that an add would be
    unlikely to affect the deleteDocs Bit Vector.  Unfortunately, the problem
    continues to happen.

    I believe I read somewhere that a similar thread saftey issue had been fixed
    in Lucene 2.4, but I suspect there may still be an issue in 2.4.1.

    Does anyone have any knowledge or insight into what I may be doing wrong or
    how I can correct/avoid the problem?  Upgrading to Lucene 3.0 or using Solr
    are not really options for me at least in the short term.

    Thanks for any information you can provide!

    Frank
    --
    View this message in context: http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27115515.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Frank Geary at Jan 12, 2010 at 4:02 pm
    Thanks for the reply Mike. Your questions were good ones because I realize
    now I should have probably used "Corrupt IndexReader" as the subject for
    this thread. Here's my answers:

    The number stays the same until the corrupted IndexReader is reopened (if
    nothing changes in the IndexReader - and thus we get the same IndexReader
    back from reopen - the problem persists). Then the next time the problem
    occurs, after we've gotten at least one non-problematic IndexReader and thus
    a few successful searches, the number is different. For what it's worth my
    latest ### was 176. My assumption has always been that it is most likely to
    be 1 beyond the end of the BitVector because each commit() is only changing
    the index by adding or deleting one document. But I don't know for sure.

    Yes, I had always expected that reopen during commit would be fine, and my
    semaphore code seems to confirm that (unless adds have something very subtle
    to do with it). Any other theories would be very much appreciated.

    At first I wasn't sure whether it was the RAMDirectory or the main dir. But
    since I now have many examples where the RAMDirectory IndexReader has
    changed and the main dir IndexReader has not, then I feel that the
    RAMDirectory IndexReader is the problem. We reopen a new IndexReader every
    time we do a search on a RAMDirectory. For the main dir, we rarely reopen
    an IndexReader. We only reopen the main dir IndexReader when a RAM
    Directory is gotten rid of, which happens once the RAM directory indexes
    10000 documents. But here's a typical example where the main dir
    IndexReader stays the same throughout the problem:

    1) both the RAMDirectory IndexReader and the main dir IndexReader are set
    after the last time we got rid of the RAMDirectory which was full with 1000
    documents.
    2) then we receive about 1077 new documents and add them to the RAMDirectory
    as well as to the main dir indexes (with a number of deletes scattered
    throughout as well). The RAM directory is commited after every add or
    delete and the main dir is not committed at all.
    3) then a search comes in, we reopen the RAM Directoy IndexReader, use the
    existing main dir IndexReader (which does not reflect any index changes
    since step 1) and do the search and get ArrayIndexOutOfBounds so the search
    fails (in this case the ### was 208)
    4) then we simply go one and get another 4000 adds and deletes in the RAM
    directory and main directory. Again the RAM directory is commited after
    every add or delete and the main dir is not committed at all.
    5) then the next search comes in, reopens the RAMDirectory IndexReader, uses
    the existing main dir IndexReader (which does not reflect any index changes
    since step 1) and the search works fine!
    6) during all that time the main dir IndexReader never got reopened, and the
    RAMDirectory IndexReader only got reopened twice - once it was bad and the
    next it was OK.

    To answer your question about when we periodically move our in-RAM changes
    to the disk index (my apologies if this is redundant):
    We never move them. They are duplicated in both the RAM directory and
    on-disk directory as they come in. Then when the RAM directory determines
    that it has 10000 documents, we begin writing to a second RAM Directory,
    reopen and warmup the on-disk IndexReader and once that new on-disk
    IndexReader is ready, we throw away the RAM directory which had 10000
    documents (to do this we actually clean out the RAM directory by deleting
    all the files it contains and we create a new IndexWriter using that same
    RAMDirectory object). Then we continue writing to that second RAMdirectory
    and the on-disk index until the second RAM directory has 10000 documents.
    And repeat.

    We have NOT run CheckIndex on the RAM directory mainly because this mostly
    happens in our production environment which has very high traffic and I'd
    have to write some special code to set aside a bad RAMDirectory index, etc
    to run the CheckIndex on it. Just not an easy thing to do.

    I can look into enabling asserts. That may help.

    Finally, for the RAM directory, we must create a new IndexWriter everytime
    "clean out" the RAM directory after we reach the 10000 documents limit. We
    clean out the RAM directory by deleting all the files it contains and we
    then create a new IndexWriter using that same RAMDirectory object. If there
    was a reopen for an IndexWriter, we'd use that instead. For the on-disk
    directory, the IndexWriter never changes and we reopen the IndexReader each
    time a RAM directory reaches the 10000 document limit.

    Any further thoughts or ideas? Thanks again for your help Mike!

    Frank


    Michael McCandless-2 wrote:
    Not good! Can you post the ###'s from the exception? How far out of
    bounds is the access?

    Your usage sounds fine. Reopen during commit is fine.

    Are you sure the exception comes from the reader on the RAMDir and not
    your main dir?

    How do you periodically move your in-RAM changes to the disk index?

    Have you run CheckIndex on your index? Also, try running with asserts
    enabled... it may catch whatever is happening, sooner.

    Do you hold open a single IW on the RAMDir, and re-use it? Or open
    and then close? What about the disk dir?

    Mike
    On Mon, Jan 11, 2010 at 1:42 PM, Frank Geary wrote:

    Hi,

    I'm using Lucene 2.4.1 and am seeing occasional index corruption.  It
    shows
    up when I call MultiSearcher.search().  MultiSearcher.search() throws the
    following exception:

    ArrayIndexOutOfBoundsException.  The error is: Array index out of range:
    ###
    where ### is a number representing an index into the deletedDocs
    BitVector
    in SegmentTermDocs.  the stack trace is as follows:

    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.util.BitVector.get(BitVector.java:91)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.next(MultiSegmentReader.java:554)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:384)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:415)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:206)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:167)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:55)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollector.java:43)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:232)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.Searcher.search(Searcher.java:86)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    com.acquiremedia.opens.index.searcher.HybridIndexSearcher.search(HybridIndexSearcher.java:311)
    .
    .
    .

    That makes sense, but I'm trying to understand what could be causing the
    corruption.

    Here's what I'm doing:

    1) I have an IndexWriter created usnig a RAMDirectory.

    2) I have a single thread processing index adds and index deletes.  This
    thread is rather simple and calls IndexWriter.addDocument() followed by
    IndexWriter.commit() or IndexWriter.deleteDocuments() followed by
    IndexWriter.commit().  The commits are done at this point because we want
    the documents available for searching immediately.

    3) I also have 4 search threads running at the same time as the index
    write
    thread.  Each time a search thread executes a search it calls
    IndexReader.reopen()  on the existing IndexReader for the index created
    using the RAMDirectory above, gets an existing index reader for another
    on-disk index and then calls MultiSearcher.search() (this brings together
    the RamDirectory index and an on-disk index) to execute the search.

    It generally works fine, but every few days or so I get the above
    ArrayIndexOutOfBoundsException.   My suspicion is that when the
    IndexWritert.commit() call happens due to a delete at the exact same time
    as
    the IndexReader.reopen() call happens, the IndexReader has a deleteDocs
    BitVector which reflects the delete, but something else internal to the
    IndexReader is not reflecting the delete.

    So, I implemented a semaphore mechanism to prevent IndexWriter.commit()
    from
    happening at the same time as IndexReader.reopen().  I only implemented
    the
    semaphores in the delete case because my guess was that an add would be
    unlikely to affect the deleteDocs Bit Vector.  Unfortunately, the problem
    continues to happen.

    I believe I read somewhere that a similar thread saftey issue had been
    fixed
    in Lucene 2.4, but I suspect there may still be an issue in 2.4.1.

    Does anyone have any knowledge or insight into what I may be doing wrong
    or
    how I can correct/avoid the problem?  Upgrading to Lucene 3.0 or using
    Solr
    are not really options for me at least in the short term.

    Thanks for any information you can provide!

    Frank
    --
    View this message in context:
    http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27115515.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27129835.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jan 12, 2010 at 7:05 pm
    Is it possible that you're not closing the old IW on the RAMDir before
    deleting files / re-using it? Or, any other possible way that two
    open writers could accidentally share the same RAMDir? Do you
    override the LockFactory of the RAMDir?

    EG with ConcurrentMergeScheduler, it can continue to write files into
    the RAMDir. It's remotely possible (though I think rather unlikely)
    that this could lead to the corruption you're seeing.

    If you can turn on setInfoStream for all writers you create and
    capture & post all output leading to the exception, that could give us
    a clue...

    You should be able to use IW's deleteAll method to remove all docs
    without closing/reopening the writer (oh -- but this is only available
    as of 2.9).

    You shouldn't have to remove files yourself -- just open a new IW with
    create=true?

    Mike
    On Tue, Jan 12, 2010 at 11:02 AM, Frank Geary wrote:

    Thanks for the reply Mike.  Your questions were good ones because I realize
    now I should have probably used "Corrupt IndexReader" as the subject for
    this thread.  Here's my answers:

    The number stays the same until the corrupted IndexReader is reopened (if
    nothing changes in the IndexReader - and thus we get the same IndexReader
    back from reopen - the problem persists).  Then the next time the problem
    occurs, after we've gotten at least one non-problematic IndexReader and thus
    a few successful searches, the number is different.  For what it's worth my
    latest ### was 176.  My assumption has always been that it is most likely to
    be 1 beyond the end of the BitVector because each commit() is only changing
    the index by adding or deleting one document.  But I don't know for sure.

    Yes, I had always expected that reopen during commit would be fine, and my
    semaphore code seems to confirm that (unless adds have something very subtle
    to do with it).  Any other theories would be very much appreciated.

    At first I wasn't sure whether it was the RAMDirectory or the main dir.  But
    since I now have many examples where the RAMDirectory IndexReader has
    changed and the main dir IndexReader has not, then I feel that the
    RAMDirectory IndexReader is the problem.  We reopen a new IndexReader every
    time we do a search on a RAMDirectory.  For the main dir, we rarely reopen
    an IndexReader.  We only reopen the main dir IndexReader when a RAM
    Directory is gotten rid of, which happens once the RAM directory indexes
    10000 documents.  But here's a typical example where the main dir
    IndexReader stays the same throughout the problem:

    1) both the RAMDirectory IndexReader and the main dir IndexReader are set
    after the last time we got rid of the RAMDirectory which was full with 10000
    documents.
    2) then we receive about 1077 new documents and add them to the RAMDirectory
    as well as to the main dir indexes (with a number of deletes scattered
    throughout as well).  The RAM directory is commited after every add or
    delete and the main dir is not committed at all.
    3) then a search comes in, we reopen the RAM Directoy IndexReader, use the
    existing main dir IndexReader (which does not reflect any index changes
    since step 1) and do the search and get ArrayIndexOutOfBounds so the search
    fails (in this case the ### was 208)
    4) then we simply go one and get another 4000 adds and deletes in the RAM
    directory and main directory.  Again the RAM directory is commited after
    every add or delete and the main dir is not committed at all.
    5) then the next search comes in, reopens the RAMDirectory IndexReader, uses
    the existing main dir IndexReader (which does not reflect any index changes
    since step 1) and the search works fine!
    6) during all that time the main dir IndexReader never got reopened, and the
    RAMDirectory IndexReader only got reopened twice - once it was bad and the
    next it was OK.

    To answer your question about when we periodically move our in-RAM changes
    to the disk index (my apologies if this is redundant):
    We never move them.  They are duplicated in both the RAM directory and
    on-disk directory as they come in.  Then when the RAM directory determines
    that it has 10000 documents, we begin writing to a second RAM Directory,
    reopen and warmup the on-disk IndexReader and once that new on-disk
    IndexReader is ready, we throw away the RAM directory which had 10000
    documents (to do this we actually clean out the RAM directory by deleting
    all the files it contains and we create a new IndexWriter using that same
    RAMDirectory object).  Then we continue writing to that second RAMdirectory
    and the on-disk index until the second RAM directory has 10000 documents.
    And repeat.

    We have NOT run CheckIndex on the RAM directory mainly because this mostly
    happens in our production environment which has very high traffic and I'd
    have to write some special code to set aside a bad RAMDirectory index, etc
    to run the CheckIndex on it.  Just not an easy thing to do.

    I can look into enabling asserts.  That may help.

    Finally, for the RAM directory, we must create a new IndexWriter everytime
    "clean out" the RAM directory after we reach the 10000 documents limit.  We
    clean out the RAM directory by deleting all the files it contains and we
    then create a new IndexWriter using that same RAMDirectory object.  If there
    was a reopen for an IndexWriter, we'd use that instead.  For the on-disk
    directory, the IndexWriter never changes and we reopen the IndexReader each
    time a RAM directory reaches the 10000 document limit.

    Any further thoughts or ideas?  Thanks again for your help Mike!

    Frank


    Michael McCandless-2 wrote:
    Not good!  Can you post the ###'s from the exception?  How far out of
    bounds is the access?

    Your usage sounds fine.  Reopen during commit is fine.

    Are you sure the exception comes from the reader on the RAMDir and not
    your main dir?

    How do you periodically move your in-RAM changes to the disk index?

    Have you run CheckIndex on your index?  Also, try running with asserts
    enabled... it may catch whatever is happening, sooner.

    Do you hold open a single IW on the RAMDir, and re-use it?  Or open
    and then close?  What about the disk dir?

    Mike

    On Mon, Jan 11, 2010 at 1:42 PM, Frank Geary <fgeary@acquiremedia.com>
    wrote:
    Hi,

    I'm using Lucene 2.4.1 and am seeing occasional index corruption.  It
    shows
    up when I call MultiSearcher.search().  MultiSearcher.search() throws the
    following exception:

    ArrayIndexOutOfBoundsException.  The error is: Array index out of range:
    ###
    where ### is a number representing an index into the deletedDocs
    BitVector
    in SegmentTermDocs.  the stack trace is as follows:

    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.util.BitVector.get(BitVector.java:91)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.next(MultiSegmentReader.java:554)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:384)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:415)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:206)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:167)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:55)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollector.java:43)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:232)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.Searcher.search(Searcher.java:86)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    com.acquiremedia.opens.index.searcher.HybridIndexSearcher.search(HybridIndexSearcher.java:311)
    .
    .
    .

    That makes sense, but I'm trying to understand what could be causing the
    corruption.

    Here's what I'm doing:

    1) I have an IndexWriter created usnig a RAMDirectory.

    2) I have a single thread processing index adds and index deletes.  This
    thread is rather simple and calls IndexWriter.addDocument() followed by
    IndexWriter.commit() or IndexWriter.deleteDocuments() followed by
    IndexWriter.commit().  The commits are done at this point because we want
    the documents available for searching immediately.

    3) I also have 4 search threads running at the same time as the index
    write
    thread.  Each time a search thread executes a search it calls
    IndexReader.reopen()  on the existing IndexReader for the index created
    using the RAMDirectory above, gets an existing index reader for another
    on-disk index and then calls MultiSearcher.search() (this brings together
    the RamDirectory index and an on-disk index) to execute the search.

    It generally works fine, but every few days or so I get the above
    ArrayIndexOutOfBoundsException.   My suspicion is that when the
    IndexWritert.commit() call happens due to a delete at the exact same time
    as
    the IndexReader.reopen() call happens, the IndexReader has a deleteDocs
    BitVector which reflects the delete, but something else internal to the
    IndexReader is not reflecting the delete.

    So, I implemented a semaphore mechanism to prevent IndexWriter.commit()
    from
    happening at the same time as IndexReader.reopen().  I only implemented
    the
    semaphores in the delete case because my guess was that an add would be
    unlikely to affect the deleteDocs Bit Vector.  Unfortunately, the problem
    continues to happen.

    I believe I read somewhere that a similar thread saftey issue had been
    fixed
    in Lucene 2.4, but I suspect there may still be an issue in 2.4.1.

    Does anyone have any knowledge or insight into what I may be doing wrong
    or
    how I can correct/avoid the problem?  Upgrading to Lucene 3.0 or using
    Solr
    are not really options for me at least in the short term.

    Thanks for any information you can provide!

    Frank
    --
    View this message in context:
    http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27115515.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27129835.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Frank Geary at Jan 21, 2010 at 9:46 pm
    Nope. When it's time to inactivate a RAMDir indexWriter, I get that
    directory, close that writer, then clear out the directory. Then after
    clearing out the directory, I create a new IW passing in the directory that
    was used previously. No, I do not override the LockFactory - not familiar
    with it.

    Hmm...for what it's worth we are using ConcurrentMergeScheduler.
    We do actaully have setInfoStream turned on for all our indexers. I'll have
    to hunt down one of our logs that corresponds to one of the Exceptions
    though. Thanks for that.

    We delete the files explicitly, because what we found was that under high
    index loads (i.e. rapid rolling of RAMDir indexWriters) leaving the RAMDir
    around lead to garbage collection issues as old RAMDirs accumulated waiting
    for garbage collection to take place. It turned out that deleting the files
    explicitly avoided those garbage collection problems.

    Thanks again.

    Frank


    Michael McCandless-2 wrote:
    Is it possible that you're not closing the old IW on the RAMDir before
    deleting files / re-using it? Or, any other possible way that two
    open writers could accidentally share the same RAMDir? Do you
    override the LockFactory of the RAMDir?

    EG with ConcurrentMergeScheduler, it can continue to write files into
    the RAMDir. It's remotely possible (though I think rather unlikely)
    that this could lead to the corruption you're seeing.

    If you can turn on setInfoStream for all writers you create and
    capture & post all output leading to the exception, that could give us
    a clue...

    You should be able to use IW's deleteAll method to remove all docs
    without closing/reopening the writer (oh -- but this is only available
    as of 2.9).

    You shouldn't have to remove files yourself -- just open a new IW with
    create=true?

    Mike
    On Tue, Jan 12, 2010 at 11:02 AM, Frank Geary wrote:

    Thanks for the reply Mike.  Your questions were good ones because I
    realize
    now I should have probably used "Corrupt IndexReader" as the subject for
    this thread.  Here's my answers:

    The number stays the same until the corrupted IndexReader is reopened (if
    nothing changes in the IndexReader - and thus we get the same IndexReader
    back from reopen - the problem persists).  Then the next time the problem
    occurs, after we've gotten at least one non-problematic IndexReader and
    thus
    a few successful searches, the number is different.  For what it's worth
    my
    latest ### was 176.  My assumption has always been that it is most likely
    to
    be 1 beyond the end of the BitVector because each commit() is only
    changing
    the index by adding or deleting one document.  But I don't know for sure.

    Yes, I had always expected that reopen during commit would be fine, and
    my
    semaphore code seems to confirm that (unless adds have something very
    subtle
    to do with it).  Any other theories would be very much appreciated.

    At first I wasn't sure whether it was the RAMDirectory or the main dir.
    But
    since I now have many examples where the RAMDirectory IndexReader has
    changed and the main dir IndexReader has not, then I feel that the
    RAMDirectory IndexReader is the problem.  We reopen a new IndexReader
    every
    time we do a search on a RAMDirectory.  For the main dir, we rarely
    reopen
    an IndexReader.  We only reopen the main dir IndexReader when a RAM
    Directory is gotten rid of, which happens once the RAM directory indexes
    10000 documents.  But here's a typical example where the main dir
    IndexReader stays the same throughout the problem:

    1) both the RAMDirectory IndexReader and the main dir IndexReader are set
    after the last time we got rid of the RAMDirectory which was full with
    10000
    documents.
    2) then we receive about 1077 new documents and add them to the
    RAMDirectory
    as well as to the main dir indexes (with a number of deletes scattered
    throughout as well).  The RAM directory is commited after every add or
    delete and the main dir is not committed at all.
    3) then a search comes in, we reopen the RAM Directoy IndexReader, use
    the
    existing main dir IndexReader (which does not reflect any index changes
    since step 1) and do the search and get ArrayIndexOutOfBounds so the
    search
    fails (in this case the ### was 208)
    4) then we simply go one and get another 4000 adds and deletes in the RAM
    directory and main directory.  Again the RAM directory is commited after
    every add or delete and the main dir is not committed at all.
    5) then the next search comes in, reopens the RAMDirectory IndexReader,
    uses
    the existing main dir IndexReader (which does not reflect any index
    changes
    since step 1) and the search works fine!
    6) during all that time the main dir IndexReader never got reopened, and
    the
    RAMDirectory IndexReader only got reopened twice - once it was bad and
    the
    next it was OK.

    To answer your question about when we periodically move our in-RAM
    changes
    to the disk index (my apologies if this is redundant):
    We never move them.  They are duplicated in both the RAM directory and
    on-disk directory as they come in.  Then when the RAM directory
    determines
    that it has 10000 documents, we begin writing to a second RAM Directory,
    reopen and warmup the on-disk IndexReader and once that new on-disk
    IndexReader is ready, we throw away the RAM directory which had 10000
    documents (to do this we actually clean out the RAM directory by deleting
    all the files it contains and we create a new IndexWriter using that same
    RAMDirectory object).  Then we continue writing to that second
    RAMdirectory
    and the on-disk index until the second RAM directory has 10000 documents.
    And repeat.

    We have NOT run CheckIndex on the RAM directory mainly because this
    mostly
    happens in our production environment which has very high traffic and I'd
    have to write some special code to set aside a bad RAMDirectory index,
    etc
    to run the CheckIndex on it.  Just not an easy thing to do.

    I can look into enabling asserts.  That may help.

    Finally, for the RAM directory, we must create a new IndexWriter
    everytime
    "clean out" the RAM directory after we reach the 10000 documents limit.
    We
    clean out the RAM directory by deleting all the files it contains and we
    then create a new IndexWriter using that same RAMDirectory object.  If
    there
    was a reopen for an IndexWriter, we'd use that instead.  For the on-disk
    directory, the IndexWriter never changes and we reopen the IndexReader
    each
    time a RAM directory reaches the 10000 document limit.

    Any further thoughts or ideas?  Thanks again for your help Mike!

    Frank


    Michael McCandless-2 wrote:
    Not good!  Can you post the ###'s from the exception?  How far out of
    bounds is the access?

    Your usage sounds fine.  Reopen during commit is fine.

    Are you sure the exception comes from the reader on the RAMDir and not
    your main dir?

    How do you periodically move your in-RAM changes to the disk index?

    Have you run CheckIndex on your index?  Also, try running with asserts
    enabled... it may catch whatever is happening, sooner.

    Do you hold open a single IW on the RAMDir, and re-use it?  Or open
    and then close?  What about the disk dir?

    Mike

    On Mon, Jan 11, 2010 at 1:42 PM, Frank Geary <fgeary@acquiremedia.com>
    wrote:
    Hi,

    I'm using Lucene 2.4.1 and am seeing occasional index corruption.  It
    shows
    up when I call MultiSearcher.search().  MultiSearcher.search() throws
    the
    following exception:

    ArrayIndexOutOfBoundsException.  The error is: Array index out of
    range:
    ###
    where ### is a number representing an index into the deletedDocs
    BitVector
    in SegmentTermDocs.  the stack trace is as follows:

    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.util.BitVector.get(BitVector.java:91)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.next(MultiSegmentReader.java:554)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:384)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:415)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:206)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:167)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:55)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollector.java:43)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:232)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    org.apache.lucene.search.Searcher.search(Searcher.java:86)
    2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
    com.acquiremedia.opens.index.searcher.HybridIndexSearcher.search(HybridIndexSearcher.java:311)
    .
    .
    .

    That makes sense, but I'm trying to understand what could be causing
    the
    corruption.

    Here's what I'm doing:

    1) I have an IndexWriter created usnig a RAMDirectory.

    2) I have a single thread processing index adds and index deletes.
    This
    thread is rather simple and calls IndexWriter.addDocument() followed by
    IndexWriter.commit() or IndexWriter.deleteDocuments() followed by
    IndexWriter.commit().  The commits are done at this point because we
    want
    the documents available for searching immediately.

    3) I also have 4 search threads running at the same time as the index
    write
    thread.  Each time a search thread executes a search it calls
    IndexReader.reopen()  on the existing IndexReader for the index created
    using the RAMDirectory above, gets an existing index reader for another
    on-disk index and then calls MultiSearcher.search() (this brings
    together
    the RamDirectory index and an on-disk index) to execute the search.

    It generally works fine, but every few days or so I get the above
    ArrayIndexOutOfBoundsException.   My suspicion is that when the
    IndexWritert.commit() call happens due to a delete at the exact same
    time
    as
    the IndexReader.reopen() call happens, the IndexReader has a deleteDocs
    BitVector which reflects the delete, but something else internal to the
    IndexReader is not reflecting the delete.

    So, I implemented a semaphore mechanism to prevent IndexWriter.commit()
    from
    happening at the same time as IndexReader.reopen().  I only implemented
    the
    semaphores in the delete case because my guess was that an add would be
    unlikely to affect the deleteDocs Bit Vector.  Unfortunately, the
    problem
    continues to happen.

    I believe I read somewhere that a similar thread saftey issue had been
    fixed
    in Lucene 2.4, but I suspect there may still be an issue in 2.4.1.

    Does anyone have any knowledge or insight into what I may be doing
    wrong
    or
    how I can correct/avoid the problem?  Upgrading to Lucene 3.0 or using
    Solr
    are not really options for me at least in the short term.

    Thanks for any information you can provide!

    Frank
    --
    View this message in context:
    http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27115515.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27129835.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27265165.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jan 22, 2010 at 8:14 am

    On Thu, Jan 21, 2010 at 4:46 PM, Frank Geary wrote:

    Nope.  When it's time to inactivate a RAMDir indexWriter, I get that
    directory,  close that writer, then clear out the directory.  Then after
    clearing out the directory, I create a new IW passing in the directory that
    was used previously.  No, I do not override the LockFactory - not familiar
    with it.
    Hmm, curious.
    Hmm...for what it's worth we are using ConcurrentMergeScheduler.
    We do actaully have setInfoStream turned on for all our indexers.  I'll have
    to hunt down one of our logs that corresponds to one of the Exceptions
    though.  Thanks for that.
    That could be helpful... eg it'd let us confirm every open IW is
    closed before a new one is opened, on the same IW (assuming we get the
    full output from all opened IWs).
    We delete the files explicitly, because what we found was that under high
    index loads (i.e. rapid rolling of RAMDir indexWriters) leaving the RAMDir
    around lead to garbage collection issues as old RAMDirs accumulated waiting
    for garbage collection to take place.  It turned out that deleting the files
    explicitly avoided those garbage collection problems.
    That's very odd -- RAMDir should be "easy" to GC once it's closed. It
    (and RAMFile) have no finalizers.

    Were you calling RAMDir.close()? That clears its fileMap, which will
    make all RAM consumed GCable.

    The fact that you hit GC issues makes me suspect something, somewhere
    is still hanging on to the RAMDir (ie, the IW isn't always being
    closed).

    If you go back to using a new RAMDir instance per IW, does the problem go away?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Frank Geary at Jan 22, 2010 at 5:02 pm
    Mike,

    Below are the portions of the merge log files related to the setInfoStream()
    calls during the time our exception happens. The ..._1... log file is from
    one RAMDir index and the ..._0... log file is from the other RAMDir index.
    The time period when our Array index out of bounds exception happens is
    2009-12-27 19:59:51,244 (when the IndexReader.reopen() is about to be
    called) to 2009-12-27 19:59:51,259 (when the exception is thrown from the
    MultiIndexSearch.search() call). As you can see from the log files below,
    our "_1" RAMDir index is the only one being used during the time the
    exception is thrown - not new information to me. Also a merge has finished
    at 2009-12-27 19:59:50,561 and thus it appears that nothing at all is
    happening (with respect to merges or commits) in either RAMDir index at the
    time the problem occurs. But one thing that is interesting to me - perhaps
    you can clarify - is there is a very short commit at the very top of the
    "_1" log file. Why is that commit different from the commit right below it?
    What does "DW: apply 1 buffered deleted terms and 0 deleted docIDs" mean?
    What does "commit: pendingCommit == null; skip" mean? Are there different
    kinds of commit?

    Here's the log file info:


    indexmergesched_1_20091227161034212.log


    [2009-12-27 19:59:50,534Z]: IW 48 [pool-2-thread-1]: commit: start
    [2009-12-27 19:59:50,534Z]: IW 48 [pool-2-thread-1]: commit: now prepare
    [2009-12-27 19:59:50,534Z]: IW 48 [pool-2-thread-1]: prepareCommit: flush
    [2009-12-27 19:59:50,534Z]: IW 48 [pool-2-thread-1]: flush: segment=null
    docStoreSegment=null docStoreOffset=0 flushDocs=false flushDeletes=true

    flushDocStores=false numDocs=0 numBufDelTerms=1
    [2009-12-27 19:59:50,534Z]: IW 48 [pool-2-thread-1]: index before flush
    _6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1 _yy:C1

    _yz:C1 _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1
    [2009-12-27 19:59:50,534Z]: IW 48 [pool-2-thread-1]: DW: apply 1 buffered
    deleted terms and 0 deleted docIDs and 0 deleted queries on 15 segments.
    [2009-12-27 19:59:50,536Z]: IW 48 [pool-2-thread-1]: startCommit(): start
    sizeInBytes=0
    [2009-12-27 19:59:50,536Z]: IW 48 [pool-2-thread-1]: skip startCommit():
    no changes pending
    [2009-12-27 19:59:50,536Z]: IW 48 [pool-2-thread-1]: commit: pendingCommit
    == null; skip
    [2009-12-27 19:59:50,536Z]: IW 48 [pool-2-thread-1]: commit: done
    [2009-12-27 19:59:50,543Z]: IW 48 [pool-2-thread-1]: commit: start
    [2009-12-27 19:59:50,543Z]: IW 48 [pool-2-thread-1]: commit: now prepare
    [2009-12-27 19:59:50,543Z]: IW 48 [pool-2-thread-1]: prepareCommit: flush
    [2009-12-27 19:59:50,543Z]: IW 48 [pool-2-thread-1]: flush: segment=_z6
    docStoreSegment=_z6 docStoreOffset=0 flushDocs=true flushDeletes=true

    flushDocStores=true numDocs=1 numBufDelTerms=0
    [2009-12-27 19:59:50,543Z]: IW 48 [pool-2-thread-1]: index before flush
    _6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1 _yy:C1

    _yz:C1 _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1
    [2009-12-27 19:59:50,543Z]: IW 48 [pool-2-thread-1]: DW: flush postings as
    segment _z6 numDocs=1
    [2009-12-27 19:59:50,543Z]: IW 48 [pool-2-thread-1]: DW: closeDocStore: 2
    files to flush to segment _z6 numDocs=1
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: DW: oldRAMSize=59392
    newFlushedSize=6213 docs/MB=168.771 new/old=10.461%
    [2009-12-27 19:59:50,544Z]: IFD [pool-2-thread-1]: now checkpoint
    "segments_xv" [16 segments ; isCommit = false]
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: LMP: findMerges: 16
    segments
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: LMP: level 6.2247195
    to 6.2465463: 5 segments
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: LMP: level -1.0 to
    5.8754334: 11 segments
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: LMP: 5 to 15: add
    this merge
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: alwaysMerge=true
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: CMS: now merge
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: CMS: index: _6o:C206
    _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1 _yy:C1 _yz:C1

    _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1 _z6:C1
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: CMS: no more merges
    pending; now return
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: startCommit(): start
    sizeInBytes=0
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: startCommit
    index=_6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1
    _yy:C1 _yz:C1

    _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1 _z6:C1 changeCount=2598
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.fnm
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.frq
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.prx
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.tis
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.tii
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.nrm
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.fdx
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: now sync _z6.fdt
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: done all syncs
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: commit: pendingCommit
    != null
    [2009-12-27 19:59:50,544Z]: IFD [pool-2-thread-1]: now checkpoint
    "segments_xw" [16 segments ; isCommit = true]
    [2009-12-27 19:59:50,544Z]: IFD [pool-2-thread-1]: deleteCommits: now decRef
    commit "segments_xv"
    [2009-12-27 19:59:50,544Z]: IFD [pool-2-thread-1]: delete "segments_xv"
    [2009-12-27 19:59:50,544Z]: IW 48 [pool-2-thread-1]: commit: done
    [2009-12-27 19:59:50,556Z]: IW 48 [pool-2-thread-1]: commit: start
    [2009-12-27 19:59:50,556Z]: IW 48 [pool-2-thread-1]: commit: now prepare
    [2009-12-27 19:59:50,556Z]: IW 48 [pool-2-thread-1]: prepareCommit: flush
    [2009-12-27 19:59:50,556Z]: IW 48 [pool-2-thread-1]: flush: segment=_z7
    docStoreSegment=_z7 docStoreOffset=0 flushDocs=true flushDeletes=true

    flushDocStores=true numDocs=1 numBufDelTerms=0
    [2009-12-27 19:59:50,556Z]: IW 48 [pool-2-thread-1]: index before flush
    _6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1 _yy:C1

    _yz:C1 _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1 _z6:C1
    [2009-12-27 19:59:50,556Z]: IW 48 [pool-2-thread-1]: DW: flush postings as
    segment _z7 numDocs=1
    [2009-12-27 19:59:50,556Z]: IW 48 [pool-2-thread-1]: DW: closeDocStore: 2
    files to flush to segment _z7 numDocs=1
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: DW: oldRAMSize=59392
    newFlushedSize=6156 docs/MB=170.334 new/old=10.365%
    [2009-12-27 19:59:50,557Z]: IFD [pool-2-thread-1]: now checkpoint
    "segments_xw" [17 segments ; isCommit = false]
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: LMP: findMerges: 17
    segments
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: LMP: level 6.2247195
    to 6.2465463: 5 segments
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: LMP: level -1.0 to
    5.8754334: 12 segments
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: LMP: 5 to 15: add
    this merge
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: alwaysMerge=true
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: CMS: now merge
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: CMS: index: _6o:C206
    _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1 _yy:C1 _yz:C1

    _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1 _z6:C1 _z7:C1
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: CMS: no more merges
    pending; now return
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: startCommit(): start
    sizeInBytes=0
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: startCommit
    index=_6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1
    _yy:C1 _yz:C1

    _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1 _z6:C1 _z7:C1 changeCount=2600
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.fnm
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.frq
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.prx
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.tis
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.tii
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.nrm
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.fdx
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: now sync _z7.fdt
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: done all syncs
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: commit: pendingCommit
    != null
    [2009-12-27 19:59:50,557Z]: IFD [pool-2-thread-1]: now checkpoint
    "segments_xx" [17 segments ; isCommit = true]
    [2009-12-27 19:59:50,557Z]: IFD [pool-2-thread-1]: deleteCommits: now decRef
    commit "segments_xw"
    [2009-12-27 19:59:50,557Z]: IFD [pool-2-thread-1]: delete "segments_xw"
    [2009-12-27 19:59:50,557Z]: IW 48 [pool-2-thread-1]: commit: done
    [2009-12-27 19:59:50,561Z]: IW 48 [Lucene Merge Thread #125]: commitMerge:
    _yv:C71 _yw:C1 _yx:C1 _yy:C1 _yz:C1 _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1 into

    _z5 index=_6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _yv:C71 _yw:C1 _yx:C1
    _yy:C1 _yz:C1 _z0:C1 _z1:C1 _z2:C1 _z3:C1 _z4:C1 _z6:C1 _z7:C1
    [2009-12-27 19:59:50,561Z]: IW 48 [Lucene Merge Thread #125]:
    commitMergeDeletes _yv:C71 _yw:C1 _yx:C1 _yy:C1 _yz:C1 _z0:C1 _z1:C1 _z2:C1
    _z3:C1 _z4:C1

    into _z5
    [2009-12-27 19:59:50,561Z]: IFD [Lucene Merge Thread #125]: now checkpoint
    "segments_xx" [8 segments ; isCommit = false]
    [2009-12-27 19:59:50,561Z]: IW 48 [Lucene Merge Thread #125]: LMP:
    findMerges: 8 segments
    [2009-12-27 19:59:50,561Z]: IW 48 [Lucene Merge Thread #125]: LMP: level
    6.2247195 to 6.2465463: 5 segments
    [2009-12-27 19:59:50,561Z]: IW 48 [Lucene Merge Thread #125]: LMP: level
    -1.0 to 5.910075: 3 segments
    [2009-12-27 19:59:50,561Z]: IW 48 [Lucene Merge Thread #125]: CMS: merge
    thread: done
    [2009-12-27 19:59:50,561Z]: IW 48 [Thread-108842]:
    AMConcurrentMergeScheduler: merge is done
    [2009-12-27 20:00:05,494Z]: IW 48 [pool-2-thread-1]: commit: start
    [2009-12-27 20:00:05,494Z]: IW 48 [pool-2-thread-1]: commit: now prepare
    [2009-12-27 20:00:05,494Z]: IW 48 [pool-2-thread-1]: prepareCommit: flush
    [2009-12-27 20:00:05,494Z]: IW 48 [pool-2-thread-1]: flush: segment=null
    docStoreSegment=null docStoreOffset=0 flushDocs=false flushDeletes=true

    flushDocStores=false numDocs=0 numBufDelTerms=1
    [2009-12-27 20:00:05,494Z]: IW 48 [pool-2-thread-1]: index before flush
    _6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _z5:C80 _z6:C1 _z7:C1
    [2009-12-27 20:00:05,495Z]: IW 48 [pool-2-thread-1]: DW: apply 1 buffered
    deleted terms and 0 deleted docIDs and 0 deleted queries on 8 segments.
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: now checkpoint
    "segments_xx" [8 segments ; isCommit = false]
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: startCommit(): start
    sizeInBytes=0
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: startCommit
    index=_6o:C206 _dd:C207 _iy:C181 _q7:C225 _wm:C203 _z5:C80 _z6:C1 _z7:C1

    changeCount=2602
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.fnm
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.frq
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.prx
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.tis
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.tii
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.nrm
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.fdx
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5.fdt
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: now sync _z5_1.del
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: done all syncs
    [2009-12-27 20:00:05,496Z]: IW 48 [pool-2-thread-1]: commit: pendingCommit
    != null
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: now checkpoint
    "segments_xy" [8 segments ; isCommit = true]
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: deleteCommits: now decRef
    commit "segments_xx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "segments_xx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.fnm"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.frq"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.prx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.tis"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.tii"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.nrm"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.fdx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yv.fdt"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.fnm"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.frq"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.prx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.tis"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.tii"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.nrm"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.fdx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yw.fdt"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.fnm"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.frq"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.prx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.tis"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.tii"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.nrm"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.fdx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yx.fdt"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yy.fnm"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yy.frq"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yy.prx"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yy.tis"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yy.tii"
    [2009-12-27 20:00:05,496Z]: IFD [pool-2-thread-1]: delete "_yy.nrm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yy.fdx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yy.fdt"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.fnm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.frq"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.prx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.tis"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.tii"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.nrm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.fdx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_yz.fdt"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.fnm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.frq"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.prx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.tis"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.tii"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.nrm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.fdx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z0.fdt"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.fnm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.frq"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.prx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.tis"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.tii"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.nrm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.fdx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z1.fdt"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.fnm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.frq"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.prx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.tis"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.tii"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.nrm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.fdx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z2.fdt"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.fnm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.frq"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.prx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.tis"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.tii"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.nrm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.fdx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z3.fdt"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.fnm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.frq"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.prx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.tis"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.tii"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.nrm"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.fdx"
    [2009-12-27 20:00:05,497Z]: IFD [pool-2-thread-1]: delete "_z4.fdt"
    [2009-12-27 20:00:05,497Z]: IW 48 [pool-2-thread-1]: commit: done

    indexmergesched_0_20091227193729876.log

    [2009-12-27 19:37:29,876Z]: IFD [ArchiveIndexSearcherFactoryCreator]:
    setInfoStream
    deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@624e2c55
    [2009-12-27 19:37:29,876Z]: IW 49 [ArchiveIndexSearcherFactoryCreator]:
    setInfoStream: dir=org.apache.lucene.store.RAMDirectory@1bc74f37
    autoCommit=false
    mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@7a40b2de
    mergeScheduler=org.apache.lucene.index.AMConcurrentMergeScheduler@2d9f4b86
    ramBufferSizeMB=16.0 maxBufferedDocs=60 maxBuffereDeleteTerms=-1
    maxFieldLength=100000 index=
    [2009-12-27 23:07:17,197Z]: IW 49 [pool-2-thread-1]: commit: start
    [2009-12-27 23:07:17,200Z]: IW 49 [pool-2-thread-1]: commit: now prepare
    [2009-12-27 23:07:17,200Z]: IW 49 [pool-2-thread-1]: prepareCommit: flush
    [2009-12-27 23:07:17,200Z]: IW 49 [pool-2-thread-1]: flush: segment=_0
    docStoreSegment=_0 docStoreOffset=0 flushDocs=true flushDeletes=true
    flushDocStores=true numDocs=1 numBufDelTerms=0
    [2009-12-27 23:07:17,200Z]: IW 49 [pool-2-thread-1]: index before flush
    [2009-12-27 23:07:17,200Z]: IW 49 [pool-2-thread-1]: DW: flush postings as
    segment _0 numDocs=1
    [2009-12-27 23:07:17,200Z]: IW 49 [pool-2-thread-1]: DW: closeDocStore: 2
    files to flush to segment _0 numDocs=1
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: DW: oldRAMSize=151552
    newFlushedSize=11654 docs/MB=89.976 new/old=7.69%
    [2009-12-27 23:07:17,201Z]: IFD [pool-2-thread-1]: now checkpoint
    "segments_1" [1 segments ; isCommit = false]
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: LMP: findMerges: 1
    segments
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: LMP: level -1.0 to
    4.1824718: 1 segments
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: alwaysMerge=true
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: CMS: now merge
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: CMS: index: _0:C1
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: CMS: no more merges
    pending; now return
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: startCommit(): start
    sizeInBytes=0
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: startCommit index=_0:C1
    changeCount=2
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.fnm
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.frq
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.prx
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.tis
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.tii
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.nrm
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.fdx
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: now sync _0.fdt
    [2009-12-27 23:07:17,201Z]: IW 49 [pool-2-thread-1]: done all syncs

    Frank


    Michael McCandless-2 wrote:
    On Thu, Jan 21, 2010 at 4:46 PM, Frank Geary wrote:

    Nope.  When it's time to inactivate a RAMDir indexWriter, I get that
    directory,  close that writer, then clear out the directory.  Then after
    clearing out the directory, I create a new IW passing in the directory
    that
    was used previously.  No, I do not override the LockFactory - not
    familiar
    with it.
    Hmm, curious.
    Hmm...for what it's worth we are using ConcurrentMergeScheduler.
    We do actaully have setInfoStream turned on for all our indexers.  I'll
    have
    to hunt down one of our logs that corresponds to one of the Exceptions
    though.  Thanks for that.
    That could be helpful... eg it'd let us confirm every open IW is
    closed before a new one is opened, on the same IW (assuming we get the
    full output from all opened IWs).
    We delete the files explicitly, because what we found was that under high
    index loads (i.e. rapid rolling of RAMDir indexWriters) leaving the
    RAMDir
    around lead to garbage collection issues as old RAMDirs accumulated
    waiting
    for garbage collection to take place.  It turned out that deleting the
    files
    explicitly avoided those garbage collection problems.
    That's very odd -- RAMDir should be "easy" to GC once it's closed. It
    (and RAMFile) have no finalizers.

    Were you calling RAMDir.close()? That clears its fileMap, which will
    make all RAM consumed GCable.

    The fact that you hit GC issues makes me suspect something, somewhere
    is still hanging on to the RAMDir (ie, the IW isn't always being
    closed).

    If you go back to using a new RAMDir instance per IW, does the problem go
    away?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27276550.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jan 22, 2010 at 9:08 pm
    It looks like each IW writes to a private log file -- could you zip up
    all such files and attach as a zip file (CC me directly because the
    list strips attachments)? It can give me a bigger picture than just
    these fragments...

    Mike
    On Fri, Jan 22, 2010 at 12:02 PM, Frank Geary wrote:

    Mike,

    Below are the portions of the merge log files related to the setInfoStream()
    calls during the time our exception happens.  The ..._1... log file is from
    one RAMDir index and the ..._0... log file is from the other RAMDir index. OK
    The time period when our Array index out of bounds exception happens is
    2009-12-27 19:59:51,244 (when the IndexReader.reopen() is about to be
    called) to 2009-12-27 19:59:51,259 (when the exception is thrown from the
    MultiIndexSearch.search() call).
    I suspect the corruption happened before that, ie, somehow the
    deletions file for a segment got overwritten. To bound the time I
    guess we have look back to the reopen before the corrupt IndexReader
    (which didn't hit the corruption).
    As you can see from the log files below,
    our "_1" RAMDir index is the only one being used during the time the
    exception is thrown - not new information to me.
    Right but we need to see logs from earlier...
    Also a merge has finished
    at 2009-12-27 19:59:50,561 and thus it appears that nothing at all is
    happening (with respect to merges or commits) in either RAMDir index at the
    time the problem occurs.
    Need to see all the infoStream log files to really be sure...
    But one thing that is interesting to me - perhaps
    you can clarify - is there is a very short commit at the very top of the
    "_1" log file.  Why is that commit different from the commit right below it?
    What does "DW: apply 1 buffered deleted terms and 0 deleted docIDs" mean?
    There was one delete-by-Term buffered (and no added docs) for the
    first commit, and the reverse for the 2nd case.
    What does "commit: pendingCommit == null; skip" mean?  Are there different
    kinds of commit?
    It looks like the deleted Term did not in fact match a document and so
    the flush & commit were skipped.

    You should also try making new RAMDirs instead of reusing the same one
    -- if that still hits GC problems then something else is wrong: once a
    RAMDir & all readers/writers using it are closed, it should be easy to
    GC.

    Or, another thing you could try is to swap in MockRAMDirectory (from
    Lucene's tests): it will throw an exception if you attempt to close it
    when the readers/writers didn't in fact close all their files. If you
    hit that exception you know something isn't being closed... it's a
    good way to catch bugs...

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Frank Geary at Feb 3, 2010 at 6:39 pm
    For the record - I haven't proven this yet - but here's my current theory of
    what is causing the problem:

    1) We start with a new RAMDir IW[0] and do some deletes and adds.
    2) We create at least one IndexReader based on that IW. The last of which
    we'll call IndexReader[A].
    3) Then we switch to using the other RAMDir IW[1].
    4) While using IW[1], we clear out IW[0] and create a new IW[0]. However,
    the last IndexReader, IndexReader[A], is left around.
    5) Eventually, we switch to using the other newly re-created RAMDir IW[0]
    again.
    6) The first search that comes in against RAMDir IW[0] trys to reopen
    IndexReader[A]. But IndexReader[A] has nothing to do with the newly
    re-created RAMDir index[0].

    So my theory is that this confuses the IndexReader.reopen() code.
    Therefore, we can perhaps fix the problem by deleting the last IndexReader
    used whenever we re-create any RAMDir IW. Furthermore, this problem may be
    so infrequent because:
    a) It can only happen the first time after two RAMDir swaps.
    b) It may depend on what state IndexReader[A] was in when it was created.
    c) It may also depend on what state the new RAMDir IW is in when that first
    reopen() call is made using the old IndexReader[A].
    d) If the first reopen for a new IW succeeds, we're fine until the next
    swap.
    e) If it fails, it could fail repeatedly or only once.

    I'll let you know when I determine if this theory is correct or not.

    Frank



    Michael McCandless-2 wrote:
    It looks like each IW writes to a private log file -- could you zip up
    all such files and attach as a zip file (CC me directly because the
    list strips attachments)? It can give me a bigger picture than just
    these fragments...

    Mike
    On Fri, Jan 22, 2010 at 12:02 PM, Frank Geary wrote:

    Mike,

    Below are the portions of the merge log files related to the
    setInfoStream()
    calls during the time our exception happens.  The ..._1... log file is
    from
    one RAMDir index and the ..._0... log file is from the other RAMDir
    index. OK
    The time period when our Array index out of bounds exception happens is
    2009-12-27 19:59:51,244 (when the IndexReader.reopen() is about to be
    called) to 2009-12-27 19:59:51,259 (when the exception is thrown from the
    MultiIndexSearch.search() call).
    I suspect the corruption happened before that, ie, somehow the
    deletions file for a segment got overwritten. To bound the time I
    guess we have look back to the reopen before the corrupt IndexReader
    (which didn't hit the corruption).
    As you can see from the log files below,
    our "_1" RAMDir index is the only one being used during the time the
    exception is thrown - not new information to me.
    Right but we need to see logs from earlier...
    Also a merge has finished
    at 2009-12-27 19:59:50,561 and thus it appears that nothing at all is
    happening (with respect to merges or commits) in either RAMDir index at
    the
    time the problem occurs.
    Need to see all the infoStream log files to really be sure...
    But one thing that is interesting to me - perhaps
    you can clarify - is there is a very short commit at the very top of the
    "_1" log file.  Why is that commit different from the commit right below
    it?
    What does "DW: apply 1 buffered deleted terms and 0 deleted docIDs" mean?
    There was one delete-by-Term buffered (and no added docs) for the
    first commit, and the reverse for the 2nd case.
    What does "commit: pendingCommit == null; skip" mean?  Are there
    different
    kinds of commit?
    It looks like the deleted Term did not in fact match a document and so
    the flush & commit were skipped.

    You should also try making new RAMDirs instead of reusing the same one
    -- if that still hits GC problems then something else is wrong: once a
    RAMDir & all readers/writers using it are closed, it should be easy to
    GC.

    Or, another thing you could try is to swap in MockRAMDirectory (from
    Lucene's tests): it will throw an exception if you attempt to close it
    when the readers/writers didn't in fact close all their files. If you
    hit that exception you know something isn't being closed... it's a
    good way to catch bugs...

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://old.nabble.com/Index-corruption-using-Lucene-2.4.1---thread-safety-issue--tp27115515p27441219.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Frank Geary at Jul 15, 2010 at 5:09 pm
    Just to conclude this issue, it seems that my theory below was correct. I
    implemented code at the time of my previous posting to "delete the last
    IndexReader used whenever we re-create a new RAMDir IW" and we have not seen
    the ArrayOutOfBounds Exception since. Thus, the lesson here is never to
    reopen an IndexReader from one RAMDir index to use for a different RAMDir
    index. It sometimes works, but will fail some of the time.

    Frank Geary


    Frank Geary wrote:
    For the record - I haven't proven this yet - but here's my current theory
    of what is causing the problem:

    1) We start with a new RAMDir IW[0] and do some deletes and adds.
    2) We create at least one IndexReader based on that IW. The last of which
    we'll call IndexReader[A].
    3) Then we switch to using the other RAMDir IW[1].
    4) While using IW[1], we clear out IW[0] and create a new IW[0]. However,
    the last IndexReader, IndexReader[A], is left around.
    5) Eventually, we switch to using the other newly re-created RAMDir IW[0]
    again.
    6) The first search that comes in against RAMDir IW[0] trys to reopen
    IndexReader[A]. But IndexReader[A] has nothing to do with the newly
    re-created RAMDir index[0].

    So my theory is that this confuses the IndexReader.reopen() code.
    Therefore, we can perhaps fix the problem by deleting the last IndexReader
    used whenever we re-create any RAMDir IW. Furthermore, this problem may
    be so infrequent because:
    a) It can only happen the first time after two RAMDir swaps.
    b) It may depend on what state IndexReader[A] was in when it was created.
    c) It may also depend on what state the new RAMDir IW is in when that
    first reopen() call is made using the old IndexReader[A].
    d) If the first reopen for a new IW succeeds, we're fine until the next
    swap.
    e) If it fails, it could fail repeatedly or only once.

    I'll let you know when I determine if this theory is correct or not.

    Frank
    --
    View this message in context: http://lucene.472066.n3.nabble.com/Index-corruption-using-Lucene-2-4-1-thread-safety-issue-tp564099p970106.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 11, '10 at 6:43p
activeJul 15, '10 at 5:09p
posts10
users2
websitelucene.apache.org

2 users in discussion

Frank Geary: 6 posts Michael McCandless: 4 posts

People

Translate

site design / logo © 2021 Grokbase