FAQ
Hi,

I see some strange behavoiur of lucene. The following scenario.
While adding documents to my index (every doc is pretty small, doc-
count is about 12000) I have implemented a custom behaviour of
flushing and committing documents to the index. Before adding
documents to the index I check if wether der ramDocCount has reached a
certain number of if the last commit is a while ago. If so i flush the
buffered documents and reopen the IndexWriter. So far, so good.
Indexing works very well. The problem is that if I send requests with
die IndexReader while writing documents with the IndexWriter (I send
around 10.000 requests to lucene) I reopen the IndexReader every 100
requests (only for testing) if the IndexReader is not current. The
first around 4000 requests work very well, but afterwards I always get
the following exception:

java.lang.ArrayIndexOutOfBoundsException: 37389
at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)
at org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:
112)
at
org
.apache
.lucene
.search
.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:172)
at
org
.apache
.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:146)
at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:
319)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
146)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
113)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
at org.apache.lucene.search.Hits.(Searcher.java:46)
at org.apache.lucene.search.Searcher.search(Searcher.java:38)

This seems to be a temporarily problem because opening a new
IndexReader after all documents were added everything is ok again and
the 10.000 requests are all right.

So what could be the problem here?

reg,
sascha

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Michael McCandless at Jun 30, 2008 at 4:35 pm
    This is spooky: that exception means you have some sort of index
    corruption. The TermScorer thinks it found a doc ID 37389, which is
    out of bounds.

    Reopening IndexReader while IndexWriter is writing should be
    completely fine.

    Is this easily reproduced? If so, if you could narrow it down to
    sequence of added documents, that'd be awesome.

    It's very strange that you see the corruption go away. Can you run
    CheckIndex (java org.apache.lucene.index.CheckIndex <indexDir>) to see
    if it detects any corruption. In fact, if you could run CheckIndex
    after each session of IndexWriter to isolate which batch of added
    documents causes the corruption, that could help us narrow it down.

    Are you changing any of the settings in IndexWriter? Are you using
    multiple threads? Which exact JRE version and OS are you using? Are
    you creating a new index at the start of each run?

    Mike

    Sascha Fahl wrote:
    Hi,

    I see some strange behavoiur of lucene. The following scenario.
    While adding documents to my index (every doc is pretty small, doc-
    count is about 12000) I have implemented a custom behaviour of
    flushing and committing documents to the index. Before adding
    documents to the index I check if wether der ramDocCount has reached
    a certain number of if the last commit is a while ago. If so i flush
    the buffered documents and reopen the IndexWriter. So far, so good.
    Indexing works very well. The problem is that if I send requests
    with die IndexReader while writing documents with the IndexWriter (I
    send around 10.000 requests to lucene) I reopen the IndexReader
    every 100 requests (only for testing) if the IndexReader is not
    current. The first around 4000 requests work very well, but
    afterwards I always get the following exception:

    java.lang.ArrayIndexOutOfBoundsException: 37389
    at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)
    at
    org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:
    112)
    at
    org
    .apache
    .lucene
    .search
    .DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:
    172)
    at
    org
    .apache
    .lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:
    146)
    at
    org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    146)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    113)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
    at org.apache.lucene.search.Hits.<init>(Hits.java:67)
    at org.apache.lucene.search.Searcher.search(Searcher.java:46)
    at org.apache.lucene.search.Searcher.search(Searcher.java:38)

    This seems to be a temporarily problem because opening a new
    IndexReader after all documents were added everything is ok again
    and the 10.000 requests are all right.

    So what could be the problem here?

    reg,
    sascha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jul 1, 2008 at 8:48 am
    OK thanks for the answers below.

    One thing to realize is, with this specific corruption, you will only
    hit the exception if the one term that has the corruption is queried
    on. Ie, only a certain term in a query will hit the corruption.

    That's great news that it's easily reproduced -- can you post the code
    you're using that hits it? It's easily reproduced when starting from
    a newly created index, right?

    Mike

    Sascha Fahl wrote:
    It is easyily reproduced. The strange thing is that when I check the
    IndexReader for currentness some IndexReader seem to get the
    corrupted version of the index and some not (the IndexReader gets
    reopened around 10 times while adding the documents to the index and
    sending 10.000 requests to the index). So maybe something goes wrong
    when the IndexReader fetches the index while IndexWriter flushes
    data to the index ( I did not change the default MergePolicy)?
    I will do the CheckIndex thing asap.
    I do not change any of the indexwriter settings. That is how I
    initialize a new IndexWriter: this.indexWriter = new
    IndexWriter(index_dir, new LiveAnalyzer(), false);
    I am working with a singleton (so only one thread adds documents to
    the index).
    This is what java -version says: java version "1.5.0_13"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-
    b05-237)
    Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)
    Currently I am developing on MacOS X Leopard, but the production
    system shall run on gentoo linux.
    New indeces only are created when there was no previous index in the
    index directory.

    Sascha

    Am 30.06.2008 um 18:34 schrieb Michael McCandless:
    This is spooky: that exception means you have some sort of index
    corruption. The TermScorer thinks it found a doc ID 37389, which
    is out of bounds.

    Reopening IndexReader while IndexWriter is writing should be
    completely fine.

    Is this easily reproduced? If so, if you could narrow it down to
    sequence of added documents, that'd be awesome.

    It's very strange that you see the corruption go away. Can you run
    CheckIndex (java org.apache.lucene.index.CheckIndex <indexDir>) to
    see if it detects any corruption. In fact, if you could run
    CheckIndex after each session of IndexWriter to isolate which batch
    of added documents causes the corruption, that could help us narrow
    it down.

    Are you changing any of the settings in IndexWriter? Are you using
    multiple threads? Which exact JRE version and OS are you using?
    Are you creating a new index at the start of each run?

    Mike

    Sascha Fahl wrote:
    Hi,

    I see some strange behavoiur of lucene. The following scenario.
    While adding documents to my index (every doc is pretty small, doc-
    count is about 12000) I have implemented a custom behaviour of
    flushing and committing documents to the index. Before adding
    documents to the index I check if wether der ramDocCount has
    reached a certain number of if the last commit is a while ago. If
    so i flush the buffered documents and reopen the IndexWriter. So
    far, so good. Indexing works very well. The problem is that if I
    send requests with die IndexReader while writing documents with
    the IndexWriter (I send around 10.000 requests to lucene) I reopen
    the IndexReader every 100 requests (only for testing) if the
    IndexReader is not current. The first around 4000 requests work
    very well, but afterwards I always get the following exception:

    java.lang.ArrayIndexOutOfBoundsException: 37389
    at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)
    at
    org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:
    112)
    at
    org
    .apache
    .lucene
    .search
    .DisjunctionSumScorer
    .advanceAfterCurrent(DisjunctionSumScorer.java:172)
    at
    org
    .apache
    .lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:
    146)
    at
    org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:
    319)
    at
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    146)
    at
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    113)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
    at org.apache.lucene.search.Hits.<init>(Hits.java:67)
    at org.apache.lucene.search.Searcher.search(Searcher.java:46)
    at org.apache.lucene.search.Searcher.search(Searcher.java:38)

    This seems to be a temporarily problem because opening a new
    IndexReader after all documents were added everything is ok again
    and the 10.000 requests are all right.

    So what could be the problem here?

    reg,
    sascha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jul 1, 2008 at 8:53 am
    By "does not help" do you mean CheckIndex never detects this
    corruption, yet you then hit that exception when searching?

    By "reopening fails" what do you mean? I thought reopen works fine,
    but then it's only the search that fails?

    Mike

    Sascha Fahl wrote:
    Checking the index after adding documents and befor reopening the
    IndexReader does not help. After adding documents nothing bad
    happens and CheckIndex says the index is all right. But when I check
    the index before reopen it
    CheckIndex does not detect any corruption and says the index is ok
    and reopening fails.

    Sascha

    Am 30.06.2008 um 18:34 schrieb Michael McCandless:
    This is spooky: that exception means you have some sort of index
    corruption. The TermScorer thinks it found a doc ID 37389, which
    is out of bounds.

    Reopening IndexReader while IndexWriter is writing should be
    completely fine.

    Is this easily reproduced? If so, if you could narrow it down to
    sequence of added documents, that'd be awesome.

    It's very strange that you see the corruption go away. Can you run
    CheckIndex (java org.apache.lucene.index.CheckIndex <indexDir>) to
    see if it detects any corruption. In fact, if you could run
    CheckIndex after each session of IndexWriter to isolate which batch
    of added documents causes the corruption, that could help us narrow
    it down.

    Are you changing any of the settings in IndexWriter? Are you using
    multiple threads? Which exact JRE version and OS are you using?
    Are you creating a new index at the start of each run?

    Mike

    Sascha Fahl wrote:
    Hi,

    I see some strange behavoiur of lucene. The following scenario.
    While adding documents to my index (every doc is pretty small, doc-
    count is about 12000) I have implemented a custom behaviour of
    flushing and committing documents to the index. Before adding
    documents to the index I check if wether der ramDocCount has
    reached a certain number of if the last commit is a while ago. If
    so i flush the buffered documents and reopen the IndexWriter. So
    far, so good. Indexing works very well. The problem is that if I
    send requests with die IndexReader while writing documents with
    the IndexWriter (I send around 10.000 requests to lucene) I reopen
    the IndexReader every 100 requests (only for testing) if the
    IndexReader is not current. The first around 4000 requests work
    very well, but afterwards I always get the following exception:

    java.lang.ArrayIndexOutOfBoundsException: 37389
    at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)
    at
    org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:
    112)
    at
    org
    .apache
    .lucene
    .search
    .DisjunctionSumScorer
    .advanceAfterCurrent(DisjunctionSumScorer.java:172)
    at
    org
    .apache
    .lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:
    146)
    at
    org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:
    319)
    at
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    146)
    at
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    113)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
    at org.apache.lucene.search.Hits.<init>(Hits.java:67)
    at org.apache.lucene.search.Searcher.search(Searcher.java:46)
    at org.apache.lucene.search.Searcher.search(Searcher.java:38)

    This seems to be a temporarily problem because opening a new
    IndexReader after all documents were added everything is ok again
    and the 10.000 requests are all right.

    So what could be the problem here?

    reg,
    sascha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jul 1, 2008 at 10:15 am
    That's interesting. So you are using IndexReader.reopen() to get a
    new reader? Are you closing the previous reader?

    The exception goes away if you create a new IndexSearcher on the
    reopened IndexReader?

    I don't yet see how that could explain the exception, though. If you
    reopen() the underling IndexReader in an IndexSearcher, the original
    IndexReader should still be intact and still searching the point-in-
    time snapshot that it had been opened on. IndexSearcher itself
    doens't hold any "state" about the index (I think); it relies on
    IndexReader for that.

    Mike

    Sascha Fahl wrote:
    I think I could solve the "problem". It was no Lucene specific
    problem. What I did was reopen the IndexReader but not creating a
    new IndexSearcher object. But of course as Java always passes
    parameters by value (no matter what parameter) the old IndexSearcher
    object did not see the updated IndexReader object, because
    IndexSearcher is working with its own instance of IndexReader and
    not with the reference to the original IndexReader. So what caused
    the problem was the requests always were sent to the same instance
    of IndexSearcher. But when the IndexSearcher had to access the index
    physically (the harddisk) of course changes made by the IndexWriter
    were just visible to the IndexReader but not to the IndexSearcher.
    Is that the explaination Mike?

    Sascha

    Am 01.07.2008 um 10:52 schrieb Michael McCandless:
    By "does not help" do you mean CheckIndex never detects this
    corruption, yet you then hit that exception when searching?

    By "reopening fails" what do you mean? I thought reopen works
    fine, but then it's only the search that fails?

    Mike

    Sascha Fahl wrote:
    Checking the index after adding documents and befor reopening the
    IndexReader does not help. After adding documents nothing bad
    happens and CheckIndex says the index is all right. But when I
    check the index before reopen it
    CheckIndex does not detect any corruption and says the index is ok
    and reopening fails.

    Sascha

    Am 30.06.2008 um 18:34 schrieb Michael McCandless:
    This is spooky: that exception means you have some sort of index
    corruption. The TermScorer thinks it found a doc ID 37389, which
    is out of bounds.

    Reopening IndexReader while IndexWriter is writing should be
    completely fine.

    Is this easily reproduced? If so, if you could narrow it down to
    sequence of added documents, that'd be awesome.

    It's very strange that you see the corruption go away. Can you
    run CheckIndex (java org.apache.lucene.index.CheckIndex
    <indexDir>) to see if it detects any corruption. In fact, if you
    could run CheckIndex after each session of IndexWriter to isolate
    which batch of added documents causes the corruption, that could
    help us narrow it down.

    Are you changing any of the settings in IndexWriter? Are you
    using multiple threads? Which exact JRE version and OS are you
    using? Are you creating a new index at the start of each run?

    Mike

    Sascha Fahl wrote:
    Hi,

    I see some strange behavoiur of lucene. The following scenario.
    While adding documents to my index (every doc is pretty small,
    doc-count is about 12000) I have implemented a custom behaviour
    of flushing and committing documents to the index. Before adding
    documents to the index I check if wether der ramDocCount has
    reached a certain number of if the last commit is a while ago.
    If so i flush the buffered documents and reopen the IndexWriter.
    So far, so good. Indexing works very well. The problem is that
    if I send requests with die IndexReader while writing documents
    with the IndexWriter (I send around 10.000 requests to lucene) I
    reopen the IndexReader every 100 requests (only for testing) if
    the IndexReader is not current. The first around 4000 requests
    work very well, but afterwards I always get the following
    exception:

    java.lang.ArrayIndexOutOfBoundsException: 37389
    at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)
    at
    org
    .apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:
    112)
    at
    org
    .apache
    .lucene
    .search
    .DisjunctionSumScorer
    .advanceAfterCurrent(DisjunctionSumScorer.java:172)
    at
    org
    .apache
    .lucene
    .search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:146)
    at
    org
    .apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:
    319)
    at
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    146)
    at
    org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    113)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
    at org.apache.lucene.search.Hits.<init>(Hits.java:67)
    at org.apache.lucene.search.Searcher.search(Searcher.java:46)
    at org.apache.lucene.search.Searcher.search(Searcher.java:38)

    This seems to be a temporarily problem because opening a new
    IndexReader after all documents were added everything is ok
    again and the 10.000 requests are all right.

    So what could be the problem here?

    reg,
    sascha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sascha Fahl at Jul 1, 2008 at 10:23 am
    Yes I am using IndexReader.reopen(). Here is my code doing this:
    public void refreshIndeces() throws CorruptIndexException,
    IOException {
    if ((System.currentTimeMillis() - this.lastRefresh) >
    this.REFRESH_PERIOD) {
    this.lastRefresh = System.currentTimeMillis();
    boolean refreshFlag = false;
    for (int i = 0; i < this.indeces.length; i++){
    IndexReader newIR = this.indeces[i].reopen();
    if (newIR != this.indeces[i]){
    this.indeces[i].close();
    refreshFlag = true;
    }
    this.indeces[i] = newIR;
    }
    if(refreshFlag){
    this.multiReader = new MultiReader(this.indeces);
    this.multiSearcher = new IndexSearcher(this.multiReader);
    }
    }
    }
    As you see I am using a MultiReader. With creating a new MultiReader +
    new IndexSearcher the exception goes away. I tested it with updating
    the index with 50000 Documents and sent 60000 requests and nothing bad
    happened.

    Sascha


    Am 01.07.2008 um 12:14 schrieb Michael McCandless:
    That's interesting. So you are using IndexReader.reopen() to get a
    new reader? Are you closing the previous reader?

    The exception goes away if you create a new IndexSearcher on the
    reopened IndexReader?

    I don't yet see how that could explain the exception, though. If
    you reopen() the underling IndexReader in an IndexSearcher, the
    original IndexReader should still be intact and still searching the
    point-in-time snapshot that it had been opened on. IndexSearcher
    itself doens't hold any "state" about the index (I think); it relies
    on IndexReader for that.

    Mike

    Sascha Fahl wrote:
    I think I could solve the "problem". It was no Lucene specific
    problem. What I did was reopen the IndexReader but not creating a
    new IndexSearcher object. But of course as Java always passes
    parameters by value (no matter what parameter) the old
    IndexSearcher object did not see the updated IndexReader object,
    because IndexSearcher is working with its own instance of
    IndexReader and not with the reference to the original IndexReader.
    So what caused
    the problem was the requests always were sent to the same instance
    of IndexSearcher. But when the IndexSearcher had to access the
    index physically (the harddisk) of course changes made by the
    IndexWriter were just visible to the IndexReader but not to the
    IndexSearcher.
    Is that the explaination Mike?

    Sascha

    Am 01.07.2008 um 10:52 schrieb Michael McCandless:
    By "does not help" do you mean CheckIndex never detects this
    corruption, yet you then hit that exception when searching?

    By "reopening fails" what do you mean? I thought reopen works
    fine, but then it's only the search that fails?

    Mike

    Sascha Fahl wrote:
    Checking the index after adding documents and befor reopening the
    IndexReader does not help. After adding documents nothing bad
    happens and CheckIndex says the index is all right. But when I
    check the index before reopen it
    CheckIndex does not detect any corruption and says the index is
    ok and reopening fails.

    Sascha

    Am 30.06.2008 um 18:34 schrieb Michael McCandless:
    This is spooky: that exception means you have some sort of index
    corruption. The TermScorer thinks it found a doc ID 37389,
    which is out of bounds.

    Reopening IndexReader while IndexWriter is writing should be
    completely fine.

    Is this easily reproduced? If so, if you could narrow it down
    to sequence of added documents, that'd be awesome.

    It's very strange that you see the corruption go away. Can you
    run CheckIndex (java org.apache.lucene.index.CheckIndex
    <indexDir>) to see if it detects any corruption. In fact, if
    you could run CheckIndex after each session of IndexWriter to
    isolate which batch of added documents causes the corruption,
    that could help us narrow it down.

    Are you changing any of the settings in IndexWriter? Are you
    using multiple threads? Which exact JRE version and OS are you
    using? Are you creating a new index at the start of each run?

    Mike

    Sascha Fahl wrote:
    Hi,

    I see some strange behavoiur of lucene. The following scenario.
    While adding documents to my index (every doc is pretty small,
    doc-count is about 12000) I have implemented a custom behaviour
    of flushing and committing documents to the index. Before
    adding documents to the index I check if wether der ramDocCount
    has reached a certain number of if the last commit is a while
    ago. If so i flush the buffered documents and reopen the
    IndexWriter. So far, so good. Indexing works very well. The
    problem is that if I send requests with die IndexReader while
    writing documents with the IndexWriter (I send around 10.000
    requests to lucene) I reopen the IndexReader every 100 requests
    (only for testing) if the IndexReader is not current. The first
    around 4000 requests work very well, but afterwards I always
    get the following exception:

    java.lang.ArrayIndexOutOfBoundsException: 37389
    at org.apache.lucene.search.TermScorer.score(TermScorer.java:
    126)
    at
    org
    .apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:
    112)
    at
    org
    .apache
    .lucene
    .search
    .DisjunctionSumScorer
    .advanceAfterCurrent(DisjunctionSumScorer.java:172)
    at
    org
    .apache
    .lucene
    .search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:146)
    at
    org
    .apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:
    319)
    at
    org
    .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    146)
    at
    org
    .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    113)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
    at org.apache.lucene.search.Hits.<init>(Hits.java:67)
    at org.apache.lucene.search.Searcher.search(Searcher.java:46)
    at org.apache.lucene.search.Searcher.search(Searcher.java:38)

    This seems to be a temporarily problem because opening a new
    IndexReader after all documents were added everything is ok
    again and the 10.000 requests are all right.

    So what could be the problem here?

    reg,
    sascha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Sascha Fahl
    Softwareenticklung

    evenity GmbH
    Zu den Mühlen 19
    D-35390 Gießen

    Mail: sascha@evenity.net
  • Michael McCandless at Jul 1, 2008 at 12:29 pm
    Aha! OK now I see how that led to your exception.

    When you create a MultiReader, passing in the array of IndexReaders,
    MultiReader simply holds onto your array. It also computes & caches
    norms() the first time its called, based on the total # docs it sees
    in all the readers in that array.

    But then when you re-opened single readers in that array, without then
    creating a new MultiReader, this makes the norms array "stale" and
    thus it's easily possible to encounter a docID that's out of bounds.

    I think a good fix for this sort of trap would be for MultiReader to
    make a private copy of the array that's passed in. I'll open an issue.

    Mike

    Sascha Fahl wrote:
    Yes I am using IndexReader.reopen(). Here is my code doing this:
    public void refreshIndeces() throws CorruptIndexException,
    IOException {
    if ((System.currentTimeMillis() - this.lastRefresh) >
    this.REFRESH_PERIOD) {
    this.lastRefresh = System.currentTimeMillis();
    boolean refreshFlag = false;
    for (int i = 0; i < this.indeces.length; i++){
    IndexReader newIR = this.indeces[i].reopen();
    if (newIR != this.indeces[i]){
    this.indeces[i].close();
    refreshFlag = true;
    }
    this.indeces[i] = newIR;
    }
    if(refreshFlag){
    this.multiReader = new MultiReader(this.indeces);
    this.multiSearcher = new IndexSearcher(this.multiReader);
    }
    }
    }
    As you see I am using a MultiReader. With creating a new MultiReader
    + new IndexSearcher the exception goes away. I tested it with
    updating the index with 50000 Documents and sent 60000 requests and
    nothing bad happened.

    Sascha


    Am 01.07.2008 um 12:14 schrieb Michael McCandless:
    That's interesting. So you are using IndexReader.reopen() to get a
    new reader? Are you closing the previous reader?

    The exception goes away if you create a new IndexSearcher on the
    reopened IndexReader?

    I don't yet see how that could explain the exception, though. If
    you reopen() the underling IndexReader in an IndexSearcher, the
    original IndexReader should still be intact and still searching the
    point-in-time snapshot that it had been opened on. IndexSearcher
    itself doens't hold any "state" about the index (I think); it
    relies on IndexReader for that.

    Mike

    Sascha Fahl wrote:
    I think I could solve the "problem". It was no Lucene specific
    problem. What I did was reopen the IndexReader but not creating a
    new IndexSearcher object. But of course as Java always passes
    parameters by value (no matter what parameter) the old
    IndexSearcher object did not see the updated IndexReader object,
    because IndexSearcher is working with its own instance of
    IndexReader and not with the reference to the original
    IndexReader. So what caused
    the problem was the requests always were sent to the same instance
    of IndexSearcher. But when the IndexSearcher had to access the
    index physically (the harddisk) of course changes made by the
    IndexWriter were just visible to the IndexReader but not to the
    IndexSearcher.
    Is that the explaination Mike?

    Sascha

    Am 01.07.2008 um 10:52 schrieb Michael McCandless:
    By "does not help" do you mean CheckIndex never detects this
    corruption, yet you then hit that exception when searching?

    By "reopening fails" what do you mean? I thought reopen works
    fine, but then it's only the search that fails?

    Mike

    Sascha Fahl wrote:
    Checking the index after adding documents and befor reopening
    the IndexReader does not help. After adding documents nothing
    bad happens and CheckIndex says the index is all right. But when
    I check the index before reopen it
    CheckIndex does not detect any corruption and says the index is
    ok and reopening fails.

    Sascha

    Am 30.06.2008 um 18:34 schrieb Michael McCandless:
    This is spooky: that exception means you have some sort of
    index corruption. The TermScorer thinks it found a doc ID
    37389, which is out of bounds.

    Reopening IndexReader while IndexWriter is writing should be
    completely fine.

    Is this easily reproduced? If so, if you could narrow it down
    to sequence of added documents, that'd be awesome.

    It's very strange that you see the corruption go away. Can you
    run CheckIndex (java org.apache.lucene.index.CheckIndex
    <indexDir>) to see if it detects any corruption. In fact, if
    you could run CheckIndex after each session of IndexWriter to
    isolate which batch of added documents causes the corruption,
    that could help us narrow it down.

    Are you changing any of the settings in IndexWriter? Are you
    using multiple threads? Which exact JRE version and OS are you
    using? Are you creating a new index at the start of each run?

    Mike

    Sascha Fahl wrote:
    Hi,

    I see some strange behavoiur of lucene. The following scenario.
    While adding documents to my index (every doc is pretty small,
    doc-count is about 12000) I have implemented a custom
    behaviour of flushing and committing documents to the index.
    Before adding documents to the index I check if wether der
    ramDocCount has reached a certain number of if the last commit
    is a while ago. If so i flush the buffered documents and
    reopen the IndexWriter. So far, so good. Indexing works very
    well. The problem is that if I send requests with die
    IndexReader while writing documents with the IndexWriter (I
    send around 10.000 requests to lucene) I reopen the
    IndexReader every 100 requests (only for testing) if the
    IndexReader is not current. The first around 4000 requests
    work very well, but afterwards I always get the following
    exception:

    java.lang.ArrayIndexOutOfBoundsException: 37389
    at org.apache.lucene.search.TermScorer.score(TermScorer.java:
    126)
    at
    org
    .apache
    .lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:112)
    at
    org
    .apache
    .lucene
    .search
    .DisjunctionSumScorer
    .advanceAfterCurrent(DisjunctionSumScorer.java:172)
    at
    org
    .apache
    .lucene
    .search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:146)
    at
    org
    .apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:
    319)
    at
    org
    .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    146)
    at
    org
    .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    113)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
    at org.apache.lucene.search.Hits.<init>(Hits.java:67)
    at org.apache.lucene.search.Searcher.search(Searcher.java:46)
    at org.apache.lucene.search.Searcher.search(Searcher.java:38)

    This seems to be a temporarily problem because opening a new
    IndexReader after all documents were added everything is ok
    again and the 10.000 requests are all right.

    So what could be the problem here?

    reg,
    sascha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-
    help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Sascha Fahl
    Softwareenticklung

    evenity GmbH
    Zu den Mühlen 19
    D-35390 Gießen

    Mail: sascha@evenity.net




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jul 1, 2008 at 12:37 pm
    OK I've opened:

    https://issues.apache.org/jira/browse/LUCENE-1323

    I'll commit the fix (to trunk, to be included in 2.4) soon.

    Mike

    Michael McCandless wrote:
    Aha! OK now I see how that led to your exception.

    When you create a MultiReader, passing in the array of IndexReaders,
    MultiReader simply holds onto your array. It also computes & caches
    norms() the first time its called, based on the total # docs it sees
    in all the readers in that array.

    But then when you re-opened single readers in that array, without
    then creating a new MultiReader, this makes the norms array "stale"
    and thus it's easily possible to encounter a docID that's out of
    bounds.

    I think a good fix for this sort of trap would be for MultiReader to
    make a private copy of the array that's passed in. I'll open an
    issue.

    Mike

    Sascha Fahl wrote:
    Yes I am using IndexReader.reopen(). Here is my code doing this:
    public void refreshIndeces() throws CorruptIndexException,
    IOException {
    if ((System.currentTimeMillis() - this.lastRefresh) >
    this.REFRESH_PERIOD) {
    this.lastRefresh = System.currentTimeMillis();
    boolean refreshFlag = false;
    for (int i = 0; i < this.indeces.length; i++){
    IndexReader newIR = this.indeces[i].reopen();
    if (newIR != this.indeces[i]){
    this.indeces[i].close();
    refreshFlag = true;
    }
    this.indeces[i] = newIR;
    }
    if(refreshFlag){
    this.multiReader = new MultiReader(this.indeces);
    this.multiSearcher = new IndexSearcher(this.multiReader);
    }
    }
    }
    As you see I am using a MultiReader. With creating a new
    MultiReader + new IndexSearcher the exception goes away. I tested
    it with updating the index with 50000 Documents and sent 60000
    requests and nothing bad happened.

    Sascha


    Am 01.07.2008 um 12:14 schrieb Michael McCandless:
    That's interesting. So you are using IndexReader.reopen() to get
    a new reader? Are you closing the previous reader?

    The exception goes away if you create a new IndexSearcher on the
    reopened IndexReader?

    I don't yet see how that could explain the exception, though. If
    you reopen() the underling IndexReader in an IndexSearcher, the
    original IndexReader should still be intact and still searching
    the point-in-time snapshot that it had been opened on.
    IndexSearcher itself doens't hold any "state" about the index (I
    think); it relies on IndexReader for that.

    Mike

    Sascha Fahl wrote:
    I think I could solve the "problem". It was no Lucene specific
    problem. What I did was reopen the IndexReader but not creating a
    new IndexSearcher object. But of course as Java always passes
    parameters by value (no matter what parameter) the old
    IndexSearcher object did not see the updated IndexReader object,
    because IndexSearcher is working with its own instance of
    IndexReader and not with the reference to the original
    IndexReader. So what caused
    the problem was the requests always were sent to the same
    instance of IndexSearcher. But when the IndexSearcher had to
    access the index physically (the harddisk) of course changes made
    by the IndexWriter were just visible to the IndexReader but not
    to the IndexSearcher.
    Is that the explaination Mike?

    Sascha

    Am 01.07.2008 um 10:52 schrieb Michael McCandless:
    By "does not help" do you mean CheckIndex never detects this
    corruption, yet you then hit that exception when searching?

    By "reopening fails" what do you mean? I thought reopen works
    fine, but then it's only the search that fails?

    Mike

    Sascha Fahl wrote:
    Checking the index after adding documents and befor reopening
    the IndexReader does not help. After adding documents nothing
    bad happens and CheckIndex says the index is all right. But
    when I check the index before reopen it
    CheckIndex does not detect any corruption and says the index is
    ok and reopening fails.

    Sascha

    Am 30.06.2008 um 18:34 schrieb Michael McCandless:
    This is spooky: that exception means you have some sort of
    index corruption. The TermScorer thinks it found a doc ID
    37389, which is out of bounds.

    Reopening IndexReader while IndexWriter is writing should be
    completely fine.

    Is this easily reproduced? If so, if you could narrow it down
    to sequence of added documents, that'd be awesome.

    It's very strange that you see the corruption go away. Can
    you run CheckIndex (java org.apache.lucene.index.CheckIndex
    <indexDir>) to see if it detects any corruption. In fact, if
    you could run CheckIndex after each session of IndexWriter to
    isolate which batch of added documents causes the corruption,
    that could help us narrow it down.

    Are you changing any of the settings in IndexWriter? Are you
    using multiple threads? Which exact JRE version and OS are
    you using? Are you creating a new index at the start of each
    run?

    Mike

    Sascha Fahl wrote:
    Hi,

    I see some strange behavoiur of lucene. The following scenario.
    While adding documents to my index (every doc is pretty
    small, doc-count is about 12000) I have implemented a custom
    behaviour of flushing and committing documents to the index.
    Before adding documents to the index I check if wether der
    ramDocCount has reached a certain number of if the last
    commit is a while ago. If so i flush the buffered documents
    and reopen the IndexWriter. So far, so good. Indexing works
    very well. The problem is that if I send requests with die
    IndexReader while writing documents with the IndexWriter (I
    send around 10.000 requests to lucene) I reopen the
    IndexReader every 100 requests (only for testing) if the
    IndexReader is not current. The first around 4000 requests
    work very well, but afterwards I always get the following
    exception:

    java.lang.ArrayIndexOutOfBoundsException: 37389
    at org.apache.lucene.search.TermScorer.score(TermScorer.java:
    126)
    at
    org
    .apache
    .lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:112)
    at
    org
    .apache
    .lucene
    .search
    .DisjunctionSumScorer
    .advanceAfterCurrent(DisjunctionSumScorer.java:172)
    at
    org
    .apache
    .lucene
    .search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:
    146)
    at
    org
    .apache
    .lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
    at
    org
    .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    146)
    at
    org
    .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:
    113)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
    at org.apache.lucene.search.Hits.<init>(Hits.java:67)
    at org.apache.lucene.search.Searcher.search(Searcher.java:46)
    at org.apache.lucene.search.Searcher.search(Searcher.java:38)

    This seems to be a temporarily problem because opening a new
    IndexReader after all documents were added everything is ok
    again and the 10.000 requests are all right.

    So what could be the problem here?

    reg,
    sascha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Sascha Fahl
    Softwareenticklung

    evenity GmbH
    Zu den Mühlen 19
    D-35390 Gießen

    Mail: sascha@evenity.net




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 30, '08 at 1:00p
activeJul 1, '08 at 12:37p
posts8
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase