FAQ
Hi,

We are sharing a Lucene index in a Linux cluster over an NFS share. We have
multiple servers reading and writing to the index.

I am getting regular lock exceptions e.g.
Lock obtain timed out:
NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-n-write.lock

- We are using Lucene 2.2.0
- We are using kernel NFS and lockd is running.
- We are using a modified version of the ExpirationTimeDeletionPolicy
found in the
Lucene test suite:
http://svn.apache.org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.java
I have set the expiration time to 600 seconds (10 minutes).
- We are using the NativeFSLockFactory with the lock folder being
within the index
folder:
/mnt/nfstest/repository/lucene/lock/
- I have implemented a handler which will pause and retry an update or delete
operation if a LockObtainFailedException or StaleReaderException is
caught. The
handler will retry the update or delete once every second for 1 minute before
re-throwing the exception and aborting.

The issue appears to be caused by a lock file which is not deleted.
The handlers
keep retrying... the process holding the lock eventually aborts...
this deletes the
lock file - any applications still running then continue normally.

The application does not throw these exceptions when it is run on a
standard Linux
file system or Windows workstation.

I would really appreciate some help with this issue. The chances are I am doing
something stupid... but I cannot think what to try next.

Thanks for your help

Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Doron Cohen at Jun 29, 2007 at 9:37 am
    hi Patrick,

    Mike is the expert in this, but until he gets in, can you add details on
    the update pattern - note that the DeletionPolicy you describe below is not
    (afaik) related to the write lock time-out issues you are facing. The
    DeletionPolicy manages better the interaction between an IndexWriter that
    deletes old files, and an IndexReader that might still use this file. The
    write lock, on the hand, just synchronizes between multiple IndexWriter
    objects attempting to open the same index for write. So, do you have
    multiple writers? Can you print/describe the writers timing scenario when
    this time-out problem occur, e.g, something like this
    w1.open
    w1.modify
    w1.close
    w2.open
    w2.modify
    w2.close
    w3.open
    w3.modify
    w3.close
    w2.open ..... time-out... but w3 closed the index.... so the
    lock-file was supposed to be removed, why wasn't it?
    Can write attempt come from different nodes in the cluster?
    Can you make sure that when "the" writer gets the lock time-out there is
    indeed no other active writer?

    Doron

    "Patrick Kimber" <mailing.patrick.kimber@gmail.com> wrote on 29/06/2007
    02:01:08:
    Hi,

    We are sharing a Lucene index in a Linux cluster over an NFS
    share. We have
    multiple servers reading and writing to the index.

    I am getting regular lock exceptions e.g.
    Lock obtain timed out:
    NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-
    n-write.lock

    - We are using Lucene 2.2.0
    - We are using kernel NFS and lockd is running.
    - We are using a modified version of the ExpirationTimeDeletionPolicy
    found in the
    Lucene test suite:
    http://svn.apache.
    org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.
    java
    I have set the expiration time to 600 seconds (10 minutes).
    - We are using the NativeFSLockFactory with the lock folder being
    within the index
    folder:
    /mnt/nfstest/repository/lucene/lock/
    - I have implemented a handler which will pause and retry an
    update or delete
    operation if a LockObtainFailedException or StaleReaderException is
    caught. The
    handler will retry the update or delete once every second for
    1 minute before
    re-throwing the exception and aborting.

    The issue appears to be caused by a lock file which is not deleted.
    The handlers
    keep retrying... the process holding the lock eventually aborts...
    this deletes the
    lock file - any applications still running then continue normally.

    The application does not throw these exceptions when it is run on a
    standard Linux
    file system or Windows workstation.

    I would really appreciate some help with this issue. The
    chances are I am doing
    something stupid... but I cannot think what to try next.

    Thanks for your help

    Patrick

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jun 29, 2007 at 11:04 am
    Hi Doron

    Thanks for your reply.

    I am working on the details of the update pattern. It will take me
    some time as I cannot reproduce the issue on demand.

    To answer your other questions, yes, we do have multiple writers. One
    writer per node in the cluster.

    I will post the results of my investigations as soon as possible.

    Thanks for your help

    Patrick


    On 29/06/07, Doron Cohen wrote:
    hi Patrick,

    Mike is the expert in this, but until he gets in, can you add details on
    the update pattern - note that the DeletionPolicy you describe below is not
    (afaik) related to the write lock time-out issues you are facing. The
    DeletionPolicy manages better the interaction between an IndexWriter that
    deletes old files, and an IndexReader that might still use this file. The
    write lock, on the hand, just synchronizes between multiple IndexWriter
    objects attempting to open the same index for write. So, do you have
    multiple writers? Can you print/describe the writers timing scenario when
    this time-out problem occur, e.g, something like this
    w1.open
    w1.modify
    w1.close
    w2.open
    w2.modify
    w2.close
    w3.open
    w3.modify
    w3.close
    w2.open ..... time-out... but w3 closed the index.... so the
    lock-file was supposed to be removed, why wasn't it?
    Can write attempt come from different nodes in the cluster?
    Can you make sure that when "the" writer gets the lock time-out there is
    indeed no other active writer?

    Doron

    "Patrick Kimber" <mailing.patrick.kimber@gmail.com> wrote on 29/06/2007
    02:01:08:
    Hi,

    We are sharing a Lucene index in a Linux cluster over an NFS
    share. We have
    multiple servers reading and writing to the index.

    I am getting regular lock exceptions e.g.
    Lock obtain timed out:
    NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-
    n-write.lock

    - We are using Lucene 2.2.0
    - We are using kernel NFS and lockd is running.
    - We are using a modified version of the ExpirationTimeDeletionPolicy
    found in the
    Lucene test suite:
    http://svn.apache.
    org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.
    java
    I have set the expiration time to 600 seconds (10 minutes).
    - We are using the NativeFSLockFactory with the lock folder being
    within the index
    folder:
    /mnt/nfstest/repository/lucene/lock/
    - I have implemented a handler which will pause and retry an
    update or delete
    operation if a LockObtainFailedException or StaleReaderException is
    caught. The
    handler will retry the update or delete once every second for
    1 minute before
    re-throwing the exception and aborting.

    The issue appears to be caused by a lock file which is not deleted.
    The handlers
    keep retrying... the process holding the lock eventually aborts...
    this deletes the
    lock file - any applications still running then continue normally.

    The application does not throw these exceptions when it is run on a
    standard Linux
    file system or Windows workstation.

    I would really appreciate some help with this issue. The
    chances are I am doing
    something stupid... but I cannot think what to try next.

    Thanks for your help

    Patrick

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jun 29, 2007 at 12:33 pm
    Hi

    As requested, I have been trying to improve the logging in the
    application so I can give you more details of the update pattern.

    I am using the Lucene Index Accessor contribution to co-ordinate the
    readers and writers:
    http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

    If the close method, in the IndexAccessProvider, fails the exception
    is logged but not re-thrown:
    public void close(IndexReader reader) {
    if (reader != null) {
    try {
    reader.close();
    } catch (IOException e) {
    log.error("", e);
    }
    }
    }

    I have been checking the application log. Just before the time when
    the lock file errors occur I found this log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
    such file or directory)
    at java.io.RandomAccessFile.open(Native Method)

    - I guess the missing segments file could result in the lock file not
    being removed?
    - Is it safe to ignore this exception (probably not)?
    - Why would the segments file be missing? Could this be connected to
    the NFS issues in some way?

    Thanks for your help

    Patrick

    On 29/06/07, Patrick Kimber wrote:
    Hi Doron

    Thanks for your reply.

    I am working on the details of the update pattern. It will take me
    some time as I cannot reproduce the issue on demand.

    To answer your other questions, yes, we do have multiple writers. One
    writer per node in the cluster.

    I will post the results of my investigations as soon as possible.

    Thanks for your help

    Patrick


    On 29/06/07, Doron Cohen wrote:
    hi Patrick,

    Mike is the expert in this, but until he gets in, can you add details on
    the update pattern - note that the DeletionPolicy you describe below is not
    (afaik) related to the write lock time-out issues you are facing. The
    DeletionPolicy manages better the interaction between an IndexWriter that
    deletes old files, and an IndexReader that might still use this file. The
    write lock, on the hand, just synchronizes between multiple IndexWriter
    objects attempting to open the same index for write. So, do you have
    multiple writers? Can you print/describe the writers timing scenario when
    this time-out problem occur, e.g, something like this
    w1.open
    w1.modify
    w1.close
    w2.open
    w2.modify
    w2.close
    w3.open
    w3.modify
    w3.close
    w2.open ..... time-out... but w3 closed the index.... so the
    lock-file was supposed to be removed, why wasn't it?
    Can write attempt come from different nodes in the cluster?
    Can you make sure that when "the" writer gets the lock time-out there is
    indeed no other active writer?

    Doron

    "Patrick Kimber" <mailing.patrick.kimber@gmail.com> wrote on 29/06/2007
    02:01:08:
    Hi,

    We are sharing a Lucene index in a Linux cluster over an NFS
    share. We have
    multiple servers reading and writing to the index.

    I am getting regular lock exceptions e.g.
    Lock obtain timed out:
    NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-
    n-write.lock

    - We are using Lucene 2.2.0
    - We are using kernel NFS and lockd is running.
    - We are using a modified version of the ExpirationTimeDeletionPolicy
    found in the
    Lucene test suite:
    http://svn.apache.
    org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.
    java
    I have set the expiration time to 600 seconds (10 minutes).
    - We are using the NativeFSLockFactory with the lock folder being
    within the index
    folder:
    /mnt/nfstest/repository/lucene/lock/
    - I have implemented a handler which will pause and retry an
    update or delete
    operation if a LockObtainFailedException or StaleReaderException is
    caught. The
    handler will retry the update or delete once every second for
    1 minute before
    re-throwing the exception and aborting.

    The issue appears to be caused by a lock file which is not deleted.
    The handlers
    keep retrying... the process holding the lock eventually aborts...
    this deletes the
    lock file - any applications still running then continue normally.

    The application does not throw these exceptions when it is run on a
    standard Linux
    file system or Windows workstation.

    I would really appreciate some help with this issue. The
    chances are I am doing
    something stupid... but I cannot think what to try next.

    Thanks for your help

    Patrick

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Jun 29, 2007 at 1:48 pm
    Note that some Solr users have reported a similar issue.
    https://issues.apache.org/jira/browse/SOLR-240

    -Yonik
    On 6/29/07, Patrick Kimber wrote:
    Hi

    As requested, I have been trying to improve the logging in the
    application so I can give you more details of the update pattern.

    I am using the Lucene Index Accessor contribution to co-ordinate the
    readers and writers:
    http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

    If the close method, in the IndexAccessProvider, fails the exception
    is logged but not re-thrown:
    public void close(IndexReader reader) {
    if (reader != null) {
    try {
    reader.close();
    } catch (IOException e) {
    log.error("", e);
    }
    }
    }

    I have been checking the application log. Just before the time when
    the lock file errors occur I found this log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
    such file or directory)
    at java.io.RandomAccessFile.open(Native Method)

    - I guess the missing segments file could result in the lock file not
    being removed?
    - Is it safe to ignore this exception (probably not)?
    - Why would the segments file be missing? Could this be connected to
    the NFS issues in some way?

    Thanks for your help

    Patrick

    On 29/06/07, Patrick Kimber wrote:
    Hi Doron

    Thanks for your reply.

    I am working on the details of the update pattern. It will take me
    some time as I cannot reproduce the issue on demand.

    To answer your other questions, yes, we do have multiple writers. One
    writer per node in the cluster.

    I will post the results of my investigations as soon as possible.

    Thanks for your help

    Patrick


    On 29/06/07, Doron Cohen wrote:
    hi Patrick,

    Mike is the expert in this, but until he gets in, can you add details on
    the update pattern - note that the DeletionPolicy you describe below is not
    (afaik) related to the write lock time-out issues you are facing. The
    DeletionPolicy manages better the interaction between an IndexWriter that
    deletes old files, and an IndexReader that might still use this file. The
    write lock, on the hand, just synchronizes between multiple IndexWriter
    objects attempting to open the same index for write. So, do you have
    multiple writers? Can you print/describe the writers timing scenario when
    this time-out problem occur, e.g, something like this
    w1.open
    w1.modify
    w1.close
    w2.open
    w2.modify
    w2.close
    w3.open
    w3.modify
    w3.close
    w2.open ..... time-out... but w3 closed the index.... so the
    lock-file was supposed to be removed, why wasn't it?
    Can write attempt come from different nodes in the cluster?
    Can you make sure that when "the" writer gets the lock time-out there is
    indeed no other active writer?

    Doron

    "Patrick Kimber" <mailing.patrick.kimber@gmail.com> wrote on 29/06/2007
    02:01:08:
    Hi,

    We are sharing a Lucene index in a Linux cluster over an NFS
    share. We have
    multiple servers reading and writing to the index.

    I am getting regular lock exceptions e.g.
    Lock obtain timed out:
    NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-
    n-write.lock

    - We are using Lucene 2.2.0
    - We are using kernel NFS and lockd is running.
    - We are using a modified version of the ExpirationTimeDeletionPolicy
    found in the
    Lucene test suite:
    http://svn.apache.
    org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.
    java
    I have set the expiration time to 600 seconds (10 minutes).
    - We are using the NativeFSLockFactory with the lock folder being
    within the index
    folder:
    /mnt/nfstest/repository/lucene/lock/
    - I have implemented a handler which will pause and retry an
    update or delete
    operation if a LockObtainFailedException or StaleReaderException is
    caught. The
    handler will retry the update or delete once every second for
    1 minute before
    re-throwing the exception and aborting.

    The issue appears to be caused by a lock file which is not deleted.
    The handlers
    keep retrying... the process holding the lock eventually aborts...
    this deletes the
    lock file - any applications still running then continue normally.

    The application does not throw these exceptions when it is run on a
    standard Linux
    file system or Windows workstation.

    I would really appreciate some help with this issue. The
    chances are I am doing
    something stupid... but I cannot think what to try next.

    Thanks for your help

    Patrick
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Jun 29, 2007 at 11:50 pm

    Yonik wrote:

    Note that some Solr users have reported a similar issue.
    https://issues.apache.org/jira/browse/SOLR-240
    Seems the scenario there is without using native locks? -
    "i get the stacktrace below ... with useNativeLocks turned off"


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Jun 30, 2007 at 1:06 am

    On 6/29/07, Doron Cohen wrote:
    Note that some Solr users have reported a similar issue.
    https://issues.apache.org/jira/browse/SOLR-240
    Seems the scenario there is without using native locks? -
    "i get the stacktrace below ... with useNativeLocks turned off"
    Yes... but that doesn't mean there isn't some sort of race condition
    or issue with lucene locking in general though. If it were only
    windows, I'd perhaps chalk it up to a virus scanner or something.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jun 30, 2007 at 12:05 pm

    Patrick Kimber wrote:

    I have been checking the application log. Just before the time when
    the lock file errors occur I found this log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
    such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    I think this exception is the root cause. On hitting this IOException
    in reader.close(), that means this reader has not released its write
    lock. Is it possible to see the full stack trace?

    Having the wrong deletion policy or even a buggy deletion policy (if
    indeed file.lastModified() varies by too much across machines) can't
    cause this (I think). At worse, the wrong deletion policy should
    cause other already-open readers to hit "Stale NFS handle"
    IOExceptions during searching. So, you should use your
    ExpirationTimeDeletionPolicy when opening your readers if they will be
    doing deletes, but I don't think it explains this root-cause exception
    during close().

    It's a rather spooky exception ... in close(), the reader initializes
    an IndexFileDeleter which lists the directory and opens any segments_N
    files that it finds.

    Do you have a writer on one machine closing, and then very soon
    thereafter this reader on a different machine does deletes and tries
    to close?

    My best guess is the exception is happening inside that initialization
    because the directory listing said that "segments_XXX" exists but then
    when it tries to open that file, it does not in fact exist. Since NFS
    client-side caching (especially directory listing cache) is not
    generally guaranteed to be "correct", it could explain this. But let's
    see the full stack trace to make sure this is it...

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jul 3, 2007 at 10:18 am
    Hi

    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?

    Here is the whole of the stack trace:

    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
    at com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:364)
    at com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:342)
    at com.company.lucene.IndexUpdate.update(IndexUpdate.java:67)
    at com.company.lucene.icm.DocumentCollection.update(DocumentCollection.java:390)
    at lucene.icm.test.Write.add(Write.java:105)
    at lucene.icm.test.Write.run(Write.java:79)
    at lucene.icm.test.Write.main(Write.java:43)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:324)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:271)
    at java.lang.Thread.run(Thread.java:534)
    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.(FSDirectory.java:506)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:531)
    at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at org.apache.lucene.index.IndexFileDeleter.(IndexWriter.java:626)
    at org.apache.lucene.index.IndexWriter.(IndexAccessProvider.java:68)
    at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more

    Thank you very much for your previous comments and emails.

    Any help solving this issue would be appreciated.

    Patrick

    On 30/06/07, Michael McCandless wrote:

    Patrick Kimber wrote:
    I have been checking the application log. Just before the time when
    the lock file errors occur I found this log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
    such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    I think this exception is the root cause. On hitting this IOException
    in reader.close(), that means this reader has not released its write
    lock. Is it possible to see the full stack trace?

    Having the wrong deletion policy or even a buggy deletion policy (if
    indeed file.lastModified() varies by too much across machines) can't
    cause this (I think). At worse, the wrong deletion policy should
    cause other already-open readers to hit "Stale NFS handle"
    IOExceptions during searching. So, you should use your
    ExpirationTimeDeletionPolicy when opening your readers if they will be
    doing deletes, but I don't think it explains this root-cause exception
    during close().

    It's a rather spooky exception ... in close(), the reader initializes
    an IndexFileDeleter which lists the directory and opens any segments_N
    files that it finds.

    Do you have a writer on one machine closing, and then very soon
    thereafter this reader on a different machine does deletes and tries
    to close?

    My best guess is the exception is happening inside that initialization
    because the directory listing said that "segments_XXX" exists but then
    when it tries to open that file, it does not in fact exist. Since NFS
    client-side caching (especially directory listing cache) is not
    generally guaranteed to be "correct", it could explain this. But let's
    see the full stack trace to make sure this is it...

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Neeraj Gupta at Jul 3, 2007 at 10:43 am
    Hi

    this is the case where index create by one server is updated by other
    server, results into index corruption. This exception occuring while
    creating instance of Index writer because at the time of index writer
    instance creation it checks if index exists or not, if you are not
    creating a new Index. And it keeps the information with it, but when you
    go to add some document, now the indexes has been modified by other
    server. Now the previous and current state doesnt match and results into
    exception.

    What kind of locking you are using? i think you should obey some kind of
    locking algo so that till the time one server is updating the indexes,
    other server should not interfere. Once a server finishes its updation
    into the indexes, it should close all writers and reader to release all
    the locking.

    The alternate solution to this problem is you can create seperate indexes
    for each server, this will help because only one thread will be updating
    the indexes so there wont be any problem.

    Cheers,
    Neeraj




    "Patrick Kimber" <mailing.patrick.kimber@gmail.com>

    07/03/2007 03:47 PM
    Please respond to
    java-user@lucene.apache.org



    To
    java-user@lucene.apache.org
    cc

    Subject
    Re: Lucene 2.2, NFS, Lock obtain timed out






    Hi

    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?

    Here is the whole of the stack trace:

    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
    at
    com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:364)
    at
    com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:342)
    at
    com.company.lucene.IndexUpdate.update(IndexUpdate.java:67)
    at
    com.company.lucene.icm.DocumentCollection.update(DocumentCollection.java:390)
    at lucene.icm.test.Write.add(Write.java:105)
    at lucene.icm.test.Write.run(Write.java:79)
    at lucene.icm.test.Write.main(Write.java:43)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:324)
    at
    org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:271)
    at java.lang.Thread.run(Thread.java:534)
    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at
    java.io.RandomAccessFile.(FSDirectory.java:506)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:531)
    at
    org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at
    org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at
    org.apache.lucene.index.IndexFileDeleter.(IndexWriter.java:626)
    at
    org.apache.lucene.index.IndexWriter.(IndexAccessProvider.java:68)
    at
    com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more

    Thank you very much for your previous comments and emails.

    Any help solving this issue would be appreciated.

    Patrick

    On 30/06/07, Michael McCandless wrote:

    Patrick Kimber wrote:
    I have been checking the application log. Just before the time when
    the lock file errors occur I found this log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
    such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    I think this exception is the root cause. On hitting this IOException
    in reader.close(), that means this reader has not released its write
    lock. Is it possible to see the full stack trace?

    Having the wrong deletion policy or even a buggy deletion policy (if
    indeed file.lastModified() varies by too much across machines) can't
    cause this (I think). At worse, the wrong deletion policy should
    cause other already-open readers to hit "Stale NFS handle"
    IOExceptions during searching. So, you should use your
    ExpirationTimeDeletionPolicy when opening your readers if they will be
    doing deletes, but I don't think it explains this root-cause exception
    during close().

    It's a rather spooky exception ... in close(), the reader initializes
    an IndexFileDeleter which lists the directory and opens any segments_N
    files that it finds.

    Do you have a writer on one machine closing, and then very soon
    thereafter this reader on a different machine does deletes and tries
    to close?

    My best guess is the exception is happening inside that initialization
    because the directory listing said that "segments_XXX" exists but then
    when it tries to open that file, it does not in fact exist. Since NFS
    client-side caching (especially directory listing cache) is not
    generally guaranteed to be "correct", it could explain this. But let's
    see the full stack trace to make sure this is it...

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org





    The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient
    is strictly prohibited.
  • Patrick Kimber at Jul 3, 2007 at 10:47 am
    Hi

    I am using the NativeFSLockFactory. I was hoping this would have
    stopped these errors.

    Patrick
    On 03/07/07, Neeraj Gupta wrote:
    Hi

    this is the case where index create by one server is updated by other
    server, results into index corruption. This exception occuring while
    creating instance of Index writer because at the time of index writer
    instance creation it checks if index exists or not, if you are not
    creating a new Index. And it keeps the information with it, but when you
    go to add some document, now the indexes has been modified by other
    server. Now the previous and current state doesnt match and results into
    exception.

    What kind of locking you are using? i think you should obey some kind of
    locking algo so that till the time one server is updating the indexes,
    other server should not interfere. Once a server finishes its updation
    into the indexes, it should close all writers and reader to release all
    the locking.

    The alternate solution to this problem is you can create seperate indexes
    for each server, this will help because only one thread will be updating
    the indexes so there wont be any problem.

    Cheers,
    Neeraj




    "Patrick Kimber" <mailing.patrick.kimber@gmail.com>

    07/03/2007 03:47 PM
    Please respond to
    java-user@lucene.apache.org



    To
    java-user@lucene.apache.org
    cc

    Subject
    Re: Lucene 2.2, NFS, Lock obtain timed out






    Hi

    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?

    Here is the whole of the stack trace:

    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
    at
    com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:364)
    at
    com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:342)
    at
    com.company.lucene.IndexUpdate.update(IndexUpdate.java:67)
    at
    com.company.lucene.icm.DocumentCollection.update(DocumentCollection.java:390)
    at lucene.icm.test.Write.add(Write.java:105)
    at lucene.icm.test.Write.run(Write.java:79)
    at lucene.icm.test.Write.main(Write.java:43)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:324)
    at
    org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:271)
    at java.lang.Thread.run(Thread.java:534)
    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at
    java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
    at
    org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at
    org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at
    org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
    at
    org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
    at
    org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
    at
    com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
    at
    com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more

    Thank you very much for your previous comments and emails.

    Any help solving this issue would be appreciated.

    Patrick

    On 30/06/07, Michael McCandless wrote:

    Patrick Kimber wrote:
    I have been checking the application log. Just before the time when
    the lock file errors occur I found this log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
    such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    I think this exception is the root cause. On hitting this IOException
    in reader.close(), that means this reader has not released its write
    lock. Is it possible to see the full stack trace?

    Having the wrong deletion policy or even a buggy deletion policy (if
    indeed file.lastModified() varies by too much across machines) can't
    cause this (I think). At worse, the wrong deletion policy should
    cause other already-open readers to hit "Stale NFS handle"
    IOExceptions during searching. So, you should use your
    ExpirationTimeDeletionPolicy when opening your readers if they will be
    doing deletes, but I don't think it explains this root-cause exception
    during close().

    It's a rather spooky exception ... in close(), the reader initializes
    an IndexFileDeleter which lists the directory and opens any segments_N
    files that it finds.

    Do you have a writer on one machine closing, and then very soon
    thereafter this reader on a different machine does deletes and tries
    to close?

    My best guess is the exception is happening inside that initialization
    because the directory listing said that "segments_XXX" exists but then
    when it tries to open that file, it does not in fact exist. Since NFS
    client-side caching (especially directory listing cache) is not
    generally guaranteed to be "correct", it could explain this. But let's
    see the full stack trace to make sure this is it...

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org





    The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient
    is strictly prohibited.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Neeraj Gupta at Jul 3, 2007 at 11:16 am
    I think you should get " NFS, Lock obtain timed out" Exception (that you
    mentioned in subject line) , instead of "java.io.FileNotFoundException:".
    Because if one server is holding lock on the directory then other server
    will wait till default LockTime Out and will throw Time out Exception
    after that. Try setting locktimeout to writer instance, it might help.

    If you are getting file not found exception, then might be the case that
    you are unlocking the index directory manually (from code or deleting lock
    file manually), in this case if 2nd sever will not be able to read the
    indexes, first server has created.

    Cheers,
    Neeraj



    "Patrick Kimber" <mailing.patrick.kimber@gmail.com>

    07/03/2007 04:16 PM
    Please respond to
    java-user@lucene.apache.org



    To
    java-user@lucene.apache.org
    cc

    Subject
    Re: Lucene 2.2, NFS, Lock obtain timed out






    Hi

    I am using the NativeFSLockFactory. I was hoping this would have
    stopped these errors.

    Patrick
    On 03/07/07, Neeraj Gupta wrote:
    Hi

    this is the case where index create by one server is updated by other
    server, results into index corruption. This exception occuring while
    creating instance of Index writer because at the time of index writer
    instance creation it checks if index exists or not, if you are not
    creating a new Index. And it keeps the information with it, but when you
    go to add some document, now the indexes has been modified by other
    server. Now the previous and current state doesnt match and results into
    exception.

    What kind of locking you are using? i think you should obey some kind of
    locking algo so that till the time one server is updating the indexes,
    other server should not interfere. Once a server finishes its updation
    into the indexes, it should close all writers and reader to release all
    the locking.

    The alternate solution to this problem is you can create seperate indexes
    for each server, this will help because only one thread will be updating
    the indexes so there wont be any problem.

    Cheers,
    Neeraj




    "Patrick Kimber" <mailing.patrick.kimber@gmail.com>

    07/03/2007 03:47 PM
    Please respond to
    java-user@lucene.apache.org



    To
    java-user@lucene.apache.org
    cc

    Subject
    Re: Lucene 2.2, NFS, Lock obtain timed out






    Hi

    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?

    Here is the whole of the stack trace:

    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
    at
    com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:364)
    at
    com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:342)
    at
    com.company.lucene.IndexUpdate.update(IndexUpdate.java:67)
    at
    com.company.lucene.icm.DocumentCollection.update(DocumentCollection.java:390)
    at lucene.icm.test.Write.add(Write.java:105)
    at lucene.icm.test.Write.run(Write.java:79)
    at lucene.icm.test.Write.main(Write.java:43)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:324)
    at
    org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:271)
    at java.lang.Thread.run(Thread.java:534)
    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at
    java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
    at
    org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at
    org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at
    org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
    at
    org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
    at
    org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
    at
    com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
    at
    com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more

    Thank you very much for your previous comments and emails.

    Any help solving this issue would be appreciated.

    Patrick

    On 30/06/07, Michael McCandless wrote:

    Patrick Kimber wrote:
    I have been checking the application log. Just before the time when
    the lock file errors occur I found this log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
    such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    I think this exception is the root cause. On hitting this IOException
    in reader.close(), that means this reader has not released its write
    lock. Is it possible to see the full stack trace?

    Having the wrong deletion policy or even a buggy deletion policy (if
    indeed file.lastModified() varies by too much across machines) can't
    cause this (I think). At worse, the wrong deletion policy should
    cause other already-open readers to hit "Stale NFS handle"
    IOExceptions during searching. So, you should use your
    ExpirationTimeDeletionPolicy when opening your readers if they will be
    doing deletes, but I don't think it explains this root-cause exception
    during close().

    It's a rather spooky exception ... in close(), the reader initializes
    an IndexFileDeleter which lists the directory and opens any segments_N
    files that it finds.

    Do you have a writer on one machine closing, and then very soon
    thereafter this reader on a different machine does deletes and tries
    to close?

    My best guess is the exception is happening inside that initialization
    because the directory listing said that "segments_XXX" exists but then
    when it tries to open that file, it does not in fact exist. Since NFS
    client-side caching (especially directory listing cache) is not
    generally guaranteed to be "correct", it could explain this. But
    let's
    see the full stack trace to make sure this is it...

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org





    The information contained in this e-mail and any accompanying documents
    may contain information that is confidential or otherwise protected from
    disclosure. If you are not the intended recipient of this message, or if
    this message has been addressed to you in error, please immediately alert
    the sender by reply e-mail and then delete this message, including any
    attachments. Any dissemination, distribution or other use of the contents
    of this message by anyone other than the intended recipient
    is strictly prohibited.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org





    The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient
    is strictly prohibited.
  • Michael McCandless at Jul 3, 2007 at 11:21 am

    "Patrick Kimber" wrote:

    I am using the NativeFSLockFactory. I was hoping this would have
    stopped these errors.
    I believe this is not a locking issue and NativeFSLockFactory should
    be working correctly over NFS.
    Here is the whole of the stack trace:

    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
    at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
    at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
    at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
    at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more
    OK, indeed the exception is inside IndexFileDeleter's initialization
    (this is what I had guessed might be happening).
    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?
    Whenever a writer is opeened, it initializes the deleter
    (IndexFileDeleter). During that initialization, we list all files in
    the index directory, and for every segments_N file we find, we open it
    and "incref" all index files that it's using. We then call the
    deletion policy's "onInit" to give it a chance to remove any of these
    commit points.

    What's happening here is the NFS directory listing is "stale" and is
    reporting that segments_n exists when in fact it doesn't. This is
    almost certainly due to the NFS client's caching (directory listing
    caches are in general not coherent for NFS clients, ie, they can "lie"
    for a short period of time, especially in cases like this).

    I think this fix is fairly simple: we should catch the
    FileNotFoundException and handle that as if the file did not exist. I
    will open a Jira issue & get a patch.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jul 3, 2007 at 11:33 am
    Hi Michael

    I am really pleased we have a potential fix. I will look out for the patch.

    Thanks for your help.

    Patrick
    On 03/07/07, Michael McCandless wrote:

    "Patrick Kimber" wrote:
    I am using the NativeFSLockFactory. I was hoping this would have
    stopped these errors.
    I believe this is not a locking issue and NativeFSLockFactory should
    be working correctly over NFS.
    Here is the whole of the stack trace:

    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
    at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
    at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
    at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
    at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more
    OK, indeed the exception is inside IndexFileDeleter's initialization
    (this is what I had guessed might be happening).
    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?
    Whenever a writer is opeened, it initializes the deleter
    (IndexFileDeleter). During that initialization, we list all files in
    the index directory, and for every segments_N file we find, we open it
    and "incref" all index files that it's using. We then call the
    deletion policy's "onInit" to give it a chance to remove any of these
    commit points.

    What's happening here is the NFS directory listing is "stale" and is
    reporting that segments_n exists when in fact it doesn't. This is
    almost certainly due to the NFS client's caching (directory listing
    caches are in general not coherent for NFS clients, ie, they can "lie"
    for a short period of time, especially in cases like this).

    I think this fix is fairly simple: we should catch the
    FileNotFoundException and handle that as if the file did not exist. I
    will open a Jira issue & get a patch.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jul 3, 2007 at 12:20 pm
    OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR.
    Please make sure you use the "take2" versions (they have added
    instrumentation to help us debug):

    https://issues.apache.org/jira/browse/LUCENE-948

    Patrick, could you please test the above "take2" JAR? Could you also call
    IndexWriter.setDefaultInfoStream(...) and capture all output from both
    machines (it will produce quite a bit of output).

    However: I'm now concerned about another potential impact of stale
    directory listing caches, specifically that the writer on the 2nd
    machine will not see the current segments_N file written by the first
    machine and will incorrectly remove the newly created files.

    I think that "take2" JAR should at least resolve this
    FileNotFoundException but I think likely you are about to hit this new
    issue.

    Mike

    "Patrick Kimber" wrote:
    Hi Michael

    I am really pleased we have a potential fix. I will look out for the
    patch.

    Thanks for your help.

    Patrick
    On 03/07/07, Michael McCandless wrote:

    "Patrick Kimber" wrote:
    I am using the NativeFSLockFactory. I was hoping this would have
    stopped these errors.
    I believe this is not a locking issue and NativeFSLockFactory should
    be working correctly over NFS.
    Here is the whole of the stack trace:

    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
    at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
    at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
    at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
    at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more
    OK, indeed the exception is inside IndexFileDeleter's initialization
    (this is what I had guessed might be happening).
    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?
    Whenever a writer is opeened, it initializes the deleter
    (IndexFileDeleter). During that initialization, we list all files in
    the index directory, and for every segments_N file we find, we open it
    and "incref" all index files that it's using. We then call the
    deletion policy's "onInit" to give it a chance to remove any of these
    commit points.

    What's happening here is the NFS directory listing is "stale" and is
    reporting that segments_n exists when in fact it doesn't. This is
    almost certainly due to the NFS client's caching (directory listing
    caches are in general not coherent for NFS clients, ie, they can "lie"
    for a short period of time, especially in cases like this).

    I think this fix is fairly simple: we should catch the
    FileNotFoundException and handle that as if the file did not exist. I
    will open a Jira issue & get a patch.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jul 3, 2007 at 1:51 pm
    Hi Michael

    I am setting up the test with the "take2" jar and will let you know
    the results as soon as I have them.

    Thanks for your help

    Patrick
    On 03/07/07, Michael McCandless wrote:
    OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR.
    Please make sure you use the "take2" versions (they have added
    instrumentation to help us debug):

    https://issues.apache.org/jira/browse/LUCENE-948

    Patrick, could you please test the above "take2" JAR? Could you also call
    IndexWriter.setDefaultInfoStream(...) and capture all output from both
    machines (it will produce quite a bit of output).

    However: I'm now concerned about another potential impact of stale
    directory listing caches, specifically that the writer on the 2nd
    machine will not see the current segments_N file written by the first
    machine and will incorrectly remove the newly created files.

    I think that "take2" JAR should at least resolve this
    FileNotFoundException but I think likely you are about to hit this new
    issue.

    Mike

    "Patrick Kimber" wrote:
    Hi Michael

    I am really pleased we have a potential fix. I will look out for the
    patch.

    Thanks for your help.

    Patrick
    On 03/07/07, Michael McCandless wrote:

    "Patrick Kimber" wrote:
    I am using the NativeFSLockFactory. I was hoping this would have
    stopped these errors.
    I believe this is not a locking issue and NativeFSLockFactory should
    be working correctly over NFS.
    Here is the whole of the stack trace:

    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
    at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
    at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
    at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
    at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more
    OK, indeed the exception is inside IndexFileDeleter's initialization
    (this is what I had guessed might be happening).
    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?
    Whenever a writer is opeened, it initializes the deleter
    (IndexFileDeleter). During that initialization, we list all files in
    the index directory, and for every segments_N file we find, we open it
    and "incref" all index files that it's using. We then call the
    deletion policy's "onInit" to give it a chance to remove any of these
    commit points.

    What's happening here is the NFS directory listing is "stale" and is
    reporting that segments_n exists when in fact it doesn't. This is
    almost certainly due to the NFS client's caching (directory listing
    caches are in general not coherent for NFS clients, ie, they can "lie"
    for a short period of time, especially in cases like this).

    I think this fix is fairly simple: we should catch the
    FileNotFoundException and handle that as if the file did not exist. I
    will open a Jira issue & get a patch.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jul 3, 2007 at 3:13 pm
    Hi Michael

    I have been running the test for over an hour without any problem.
    The index writer log file is getting rather large so I cannot leave
    the test running overnight. I will run the test again tomorrow
    morning and let you know how it goes.

    Thanks again...

    Patrick
    On 03/07/07, Patrick Kimber wrote:
    Hi Michael

    I am setting up the test with the "take2" jar and will let you know
    the results as soon as I have them.

    Thanks for your help

    Patrick
    On 03/07/07, Michael McCandless wrote:
    OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR.
    Please make sure you use the "take2" versions (they have added
    instrumentation to help us debug):

    https://issues.apache.org/jira/browse/LUCENE-948

    Patrick, could you please test the above "take2" JAR? Could you also call
    IndexWriter.setDefaultInfoStream(...) and capture all output from both
    machines (it will produce quite a bit of output).

    However: I'm now concerned about another potential impact of stale
    directory listing caches, specifically that the writer on the 2nd
    machine will not see the current segments_N file written by the first
    machine and will incorrectly remove the newly created files.

    I think that "take2" JAR should at least resolve this
    FileNotFoundException but I think likely you are about to hit this new
    issue.

    Mike

    "Patrick Kimber" wrote:
    Hi Michael

    I am really pleased we have a potential fix. I will look out for the
    patch.

    Thanks for your help.

    Patrick
    On 03/07/07, Michael McCandless wrote:

    "Patrick Kimber" wrote:
    I am using the NativeFSLockFactory. I was hoping this would have
    stopped these errors.
    I believe this is not a locking issue and NativeFSLockFactory should
    be working correctly over NFS.
    Here is the whole of the stack trace:

    Caused by: java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
    file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
    at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
    at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
    at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
    at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
    at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
    at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
    ... 13 more
    OK, indeed the exception is inside IndexFileDeleter's initialization
    (this is what I had guessed might be happening).
    I have added more logging to my test application. I have two servers
    writing to a shared Lucene index on an NFS partition...

    Here is the logging from one server...

    [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
    [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
    [segments_n]

    and the other server (at the same time):

    [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
    [10:49:18] [DEBUG] IndexAccessProvider getWriter()
    [10:49:18] [ERROR] DocumentCollection update(DocumentData)
    com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
    document to the index.
    [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
    such file or directory)]
    at
    com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

    I think the exception is being thrown when the IndexWriter is created:
    new IndexWriter(directory, false, analyzer, false, deletionPolicy);

    I am confused... segments_n should not have been touched for 3 minutes
    so why would a new IndexWriter want to read it?
    Whenever a writer is opeened, it initializes the deleter
    (IndexFileDeleter). During that initialization, we list all files in
    the index directory, and for every segments_N file we find, we open it
    and "incref" all index files that it's using. We then call the
    deletion policy's "onInit" to give it a chance to remove any of these
    commit points.

    What's happening here is the NFS directory listing is "stale" and is
    reporting that segments_n exists when in fact it doesn't. This is
    almost certainly due to the NFS client's caching (directory listing
    caches are in general not coherent for NFS clients, ie, they can "lie"
    for a short period of time, especially in cases like this).

    I think this fix is fairly simple: we should catch the
    FileNotFoundException and handle that as if the file did not exist. I
    will open a Jira issue & get a patch.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jul 3, 2007 at 3:27 pm

    "Patrick Kimber" wrote:

    I have been running the test for over an hour without any problem.
    The index writer log file is getting rather large so I cannot leave
    the test running overnight. I will run the test again tomorrow
    morning and let you know how it goes.
    Ahhh, that's good news, I'm glad to hear that!

    You should go ahead and turn off the logging and make sure things are
    still fine (just in case logging is changing timing of events since
    timing is a factor here).

    In your logs, do you see lines like this?:

    ... hit FileNotFoundException when loading commit "segment_X"; skipping this commit point

    That would confirm the new code (to catch the FileNotFoundException)
    is indeed being hit.

    Actually, could you also check the logs and try to verify that each
    time one machine closed its writer and a 2nd machine opened a new
    writer that the 2nd machine indeed loaded the newest segments_N file
    and not segments_N-1? (This is the possible new issue I was referring
    to). I fear that this new issue could silently lose documents added
    by another machine and possibly not throw an exception.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jul 4, 2007 at 7:29 am
    Hi Michael

    Yes, there are many lines in the logs saying:
    hit FileNotFoundException when loading commit "segment_X"; skipping
    this commit point
    ...so it looks like the new code is working perfectly.

    I am sorry to be vague... but how do I check which segments file is
    opened when a new writer is created?

    I will add a check to my test to see if all documents are added. This
    should tell us if any documents are being silently lost.

    Thanks

    Patrick
    On 03/07/07, Michael McCandless wrote:
    "Patrick Kimber" wrote:
    I have been running the test for over an hour without any problem.
    The index writer log file is getting rather large so I cannot leave
    the test running overnight. I will run the test again tomorrow
    morning and let you know how it goes.
    Ahhh, that's good news, I'm glad to hear that!

    You should go ahead and turn off the logging and make sure things are
    still fine (just in case logging is changing timing of events since
    timing is a factor here).

    In your logs, do you see lines like this?:

    ... hit FileNotFoundException when loading commit "segment_X"; skipping this commit point

    That would confirm the new code (to catch the FileNotFoundException)
    is indeed being hit.

    Actually, could you also check the logs and try to verify that each
    time one machine closed its writer and a 2nd machine opened a new
    writer that the 2nd machine indeed loaded the newest segments_N file
    and not segments_N-1? (This is the possible new issue I was referring
    to). I fear that this new issue could silently lose documents added
    by another machine and possibly not throw an exception.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jul 4, 2007 at 10:14 am

    "Patrick Kimber" wrote:

    Yes, there are many lines in the logs saying:
    hit FileNotFoundException when loading commit "segment_X"; skipping
    this commit point
    ...so it looks like the new code is working perfectly. Super!
    I am sorry to be vague... but how do I check which segments file is
    opened when a new writer is created?
    Oh, sorry, it's not exactly obvious. Here's what to look for:

    On machine #1 (the machine that added docs & then closed its writer)
    you should see lines like this, which are printed every time the
    writer flushes its docs:

    checkpoint: wrote segments file "segments_X"

    Find the last such line on machine #1 before it closes the writer, and
    that's the "current" segments_X in the index.

    Then on machine #2 (the machine that immediately opens a new writer
    after machine #1 closed its writer) you should see a line like this:

    org.apache.lucene.index.IndexFileDeleter@XXXXXXX main: init: current segments file is "segments_Y"

    which indicates which segments file was loaded by this writer. The
    thing to verify is that X is always equal to Y whenever a writer
    quickly moves from machine #1 to machine #2.
    I will add a check to my test to see if all documents are added. This
    should tell us if any documents are being silently lost.
    Very good! Keep us posted, and good luck,

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jul 5, 2007 at 2:20 pm
    Hi Michael

    Just to let you know, I am on holiday for one week so will not be able
    to send a progress report until I return.

    I have deployed the new code to a test site so I will be informed if
    the users notice any issues.

    Thanks for your help

    Patrick

    On 04/07/07, Michael McCandless wrote:

    "Patrick Kimber" wrote:
    Yes, there are many lines in the logs saying:
    hit FileNotFoundException when loading commit "segment_X"; skipping
    this commit point
    ...so it looks like the new code is working perfectly. Super!
    I am sorry to be vague... but how do I check which segments file is
    opened when a new writer is created?
    Oh, sorry, it's not exactly obvious. Here's what to look for:

    On machine #1 (the machine that added docs & then closed its writer)
    you should see lines like this, which are printed every time the
    writer flushes its docs:

    checkpoint: wrote segments file "segments_X"

    Find the last such line on machine #1 before it closes the writer, and
    that's the "current" segments_X in the index.

    Then on machine #2 (the machine that immediately opens a new writer
    after machine #1 closed its writer) you should see a line like this:

    org.apache.lucene.index.IndexFileDeleter@XXXXXXX main: init: current segments file is "segments_Y"

    which indicates which segments file was loaded by this writer. The
    thing to verify is that X is always equal to Y whenever a writer
    quickly moves from machine #1 to machine #2.
    I will add a check to my test to see if all documents are added. This
    should tell us if any documents are being silently lost.
    Very good! Keep us posted, and good luck,

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Pkimber at Sep 7, 2007 at 2:46 pm
    Hi

    We are still getting various issues on our Lucene indexes running on an NFS
    share. It has taken me some time to find some useful information to report
    to the mailing list.

    I have created a test application which is running on two Linux servers.
    The Lucene index is on an NFS share. After running for some time, both
    instances throw this exception:

    Caused by: java.io.FileNotFoundException:
    /tmp/nfstest/repository/lucene/lucene-test/_zr.cfs (No such file or
    directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.(FSDirectory.java:506)
    at
    org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:445)
    at
    org.apache.lucene.index.CompoundFileReader.(SegmentReader.java:211)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:132)
    at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:201)
    at
    org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:614)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:180)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:162)
    at org.apache.lucene.search.IndexSearcher.(IndexAccessProvider.java:110)
    at
    com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getSearcher(LuceneIndexAccessor.java:291)
    at
    com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getSearcher(LuceneIndexAccessor.java:256)
    at
    com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getSearcher(LuceneIndexAccessor.java:249)
    at
    com.thecompany.lucene.index.LuceneIndexManager.getSearcher(LuceneIndexManager.java:196)
    ... 15 more

    I have enabled the info stream on the IndexWriter object using
    IndexWriter.setDefaultInfoStream(). The output from the two servers is as
    follows:

    Server 1:
    $ cat index-writer-info-stream.out | grep _zr.cfs
    org.apache.lucene.index.IndexFileDeleter@1984a9d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 0
    org.apache.lucene.index.IndexFileDeleter@1984a9d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 1
    org.apache.lucene.index.IndexFileDeleter@1984a9d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 2
    org.apache.lucene.index.IndexFileDeleter@1984a9d
    lucene.icm.test.Write.main(): DecRef "_zr.cfs": pre-decr count is 3
    org.apache.lucene.index.IndexFileDeleter@1984a9d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 2

    Server 2:
    $ cat index-writer-info-stream.out | grep _zr.cfs
    org.apache.lucene.index.IndexFileDeleter@20a83c2a
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 0
    org.apache.lucene.index.IndexFileDeleter@20a83c2a
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 1
    org.apache.lucene.index.IndexFileDeleter@20a83c2a
    lucene.icm.test.Write.main(): DecRef "_zr.cfs": pre-decr count is 2
    org.apache.lucene.index.IndexFileDeleter@4b69d75d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 0
    org.apache.lucene.index.IndexFileDeleter@4b69d75d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 1
    org.apache.lucene.index.IndexFileDeleter@4b69d75d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 2
    org.apache.lucene.index.IndexFileDeleter@4b69d75d
    lucene.icm.test.Write.main(): DecRef "_zr.cfs": pre-decr count is 3
    org.apache.lucene.index.IndexFileDeleter@4b69d75d
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 2
    org.apache.lucene.index.IndexFileDeleter@4b69d75d
    lucene.icm.test.Write.main(): DecRef "_zr.cfs": pre-decr count is 3
    org.apache.lucene.index.IndexFileDeleter@4ecd51ad
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 0
    org.apache.lucene.index.IndexFileDeleter@4ecd51ad
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 1
    org.apache.lucene.index.IndexFileDeleter@4ecd51ad
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 2
    org.apache.lucene.index.IndexFileDeleter@4ecd51ad
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 3
    org.apache.lucene.index.IndexFileDeleter@4ecd51ad
    lucene.icm.test.Write.main(): DecRef "_zr.cfs": pre-decr count is 4
    org.apache.lucene.index.IndexFileDeleter@4ecd51ad
    lucene.icm.test.Write.main(): IncRef "_zr.cfs": pre-incr count is 3

    I have added logging to our ExpirationTimeDeletionPolicy and I don't think
    it is deleting the "_zr.cfs" file.

    Once again, I would really appreciate your help solving this issue,

    Thanks for your help,

    Patrick


    Michael McCandless-2 wrote:

    Very good! Keep us posted, and good luck,

    Mike
    --
    View this message in context: http://www.nabble.com/Lucene-2.2%2C-NFS%2C-Lock-obtain-timed-out-tf3998926.html#a12556701
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 7, 2007 at 4:06 pm

    "pkimber" wrote:

    We are still getting various issues on our Lucene indexes running on
    an NFS share. It has taken me some time to find some useful
    information to report to the mailing list.
    Bummer!

    Can you zip up your test application that shows the issue, as well as
    the full logs from both servers? I can look at them & try to
    reproduce the error.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Sep 7, 2007 at 4:27 pm

    "pkimber" wrote:
    We are still getting various issues on our Lucene indexes running on
    an NFS share. It has taken me some time to find some useful
    information to report to the mailing list.
    Bummer!

    Can you zip up your test application that shows the issue, as well as
    the full logs from both servers? I can look at them & try to
    reproduce the error.

    Mike
    Yeh, I know!

    I cannot send you the source code without speaking to my manager
    first. I guess he would want me to change the code before sending it
    to you. You could have the log files now, but I expect you want to
    wait until the test application is ready to send?

    Thanks for your help,

    Patrick

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 7, 2007 at 6:07 pm

    "Patrick Kimber" wrote:

    I cannot send you the source code without speaking to my manager
    first. I guess he would want me to change the code before sending it
    to you. You could have the log files now, but I expect you want to
    wait until the test application is ready to send?
    I can try to start with the full logs.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Jun 29, 2007 at 2:14 pm
    This is an interesting choice. Perhaps you have modified
    LuceneIndexAccessor, but it seems to me (without knowing much about your
    setup) that you would have odd reader behavior. On a 3 node system, if you
    add docs with node 1 and 2 but not 3 and your doing searches against all 3
    nodes, node 3 will have old readers opened until you add a doc to node 3.
    This is an odd consistency issue (node 1 and 2 have current views because
    you are adding docs to them, but node 3 will be stale until it gets a doc),
    but also if you keep adding docs to node 1 and 2, or just plain add no docs
    to node 3, won't node 3's reader's index files be pulled out from under it
    after 10 minutes? Node 3 (or 1 and 2 for that matter) will not give up its
    cached readers *until* you add a doc with that particular node.

    Perhaps I am all wet on this (I havn't used NFS with Lucene), but I think
    you may need to somehow coordinate the delete policy with the
    LuceneIndexAccessor on each node.

    This may be unrelated to your problem,and perhaps you get around the issue
    somehow, but just to throw it out there...

    - Mark
    On 6/29/07, Patrick Kimber wrote:



    I am using the Lucene Index Accessor contribution to co-ordinate the
    readers and writers:

    http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049
  • Patrick Kimber at Jun 29, 2007 at 2:45 pm
    Hi Mark

    Yes, thank you. I can see your point and I think we might have to pay
    some attention to this issue.

    But, we sometimes see this error on an NFS share within 2 minutes of
    starting the test so I don't think this is the only problem.

    Once again, thanks for the idea. I will certainly be looking to
    modify the code in the LuceneIndexAccessor to take this into account.

    Patrick
    On 29/06/07, Mark Miller wrote:
    This is an interesting choice. Perhaps you have modified
    LuceneIndexAccessor, but it seems to me (without knowing much about your
    setup) that you would have odd reader behavior. On a 3 node system, if you
    add docs with node 1 and 2 but not 3 and your doing searches against all 3
    nodes, node 3 will have old readers opened until you add a doc to node 3.
    This is an odd consistency issue (node 1 and 2 have current views because
    you are adding docs to them, but node 3 will be stale until it gets a doc),
    but also if you keep adding docs to node 1 and 2, or just plain add no docs
    to node 3, won't node 3's reader's index files be pulled out from under it
    after 10 minutes? Node 3 (or 1 and 2 for that matter) will not give up its
    cached readers *until* you add a doc with that particular node.

    Perhaps I am all wet on this (I havn't used NFS with Lucene), but I think
    you may need to somehow coordinate the delete policy with the
    LuceneIndexAccessor on each node.

    This may be unrelated to your problem,and perhaps you get around the issue
    somehow, but just to throw it out there...

    - Mark
    On 6/29/07, Patrick Kimber wrote:



    I am using the Lucene Index Accessor contribution to co-ordinate the
    readers and writers:

    http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Jun 29, 2007 at 3:42 pm
    If your getting java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 within 2
    minutes, this is very odd indeed. That would seem to imply your deletion
    policy is not working.

    You might try just using one of the nodes as the writer. In Michaels
    comments, he always seems to mention the pattern of one writer many
    readers on nfs. In this case you could use no LockFactory and perhaps
    gain a little speed there.

    - Mark

    Patrick Kimber wrote:
    Hi Mark

    Yes, thank you. I can see your point and I think we might have to pay
    some attention to this issue.

    But, we sometimes see this error on an NFS share within 2 minutes of
    starting the test so I don't think this is the only problem.

    Once again, thanks for the idea. I will certainly be looking to
    modify the code in the LuceneIndexAccessor to take this into account.

    Patrick
    On 29/06/07, Mark Miller wrote:
    This is an interesting choice. Perhaps you have modified
    LuceneIndexAccessor, but it seems to me (without knowing much about your
    setup) that you would have odd reader behavior. On a 3 node system,
    if you
    add docs with node 1 and 2 but not 3 and your doing searches against
    all 3
    nodes, node 3 will have old readers opened until you add a doc to
    node 3.
    This is an odd consistency issue (node 1 and 2 have current views
    because
    you are adding docs to them, but node 3 will be stale until it gets a
    doc),
    but also if you keep adding docs to node 1 and 2, or just plain add
    no docs
    to node 3, won't node 3's reader's index files be pulled out from
    under it
    after 10 minutes? Node 3 (or 1 and 2 for that matter) will not give
    up its
    cached readers *until* you add a doc with that particular node.

    Perhaps I am all wet on this (I havn't used NFS with Lucene), but I
    think
    you may need to somehow coordinate the delete policy with the
    LuceneIndexAccessor on each node.

    This may be unrelated to your problem,and perhaps you get around the
    issue
    somehow, but just to throw it out there...

    - Mark
    On 6/29/07, Patrick Kimber wrote:



    I am using the Lucene Index Accessor contribution to co-ordinate the
    readers and writers:
    http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick Kimber at Jun 29, 2007 at 3:50 pm
    Hi Mark

    I just ran my test again... and the error occurred after 10 minutes -
    which is the time when my deletion policy is triggered. So... I think
    you might have found the answer to my problem.

    I will spend more time looking at it on Monday.

    Thank you very much for your help and enjoy your weekend.

    Patrick
    On 29/06/07, Mark Miller wrote:
    If your getting java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 within 2
    minutes, this is very odd indeed. That would seem to imply your deletion
    policy is not working.

    You might try just using one of the nodes as the writer. In Michaels
    comments, he always seems to mention the pattern of one writer many
    readers on nfs. In this case you could use no LockFactory and perhaps
    gain a little speed there.

    - Mark

    Patrick Kimber wrote:
    Hi Mark

    Yes, thank you. I can see your point and I think we might have to pay
    some attention to this issue.

    But, we sometimes see this error on an NFS share within 2 minutes of
    starting the test so I don't think this is the only problem.

    Once again, thanks for the idea. I will certainly be looking to
    modify the code in the LuceneIndexAccessor to take this into account.

    Patrick
    On 29/06/07, Mark Miller wrote:
    This is an interesting choice. Perhaps you have modified
    LuceneIndexAccessor, but it seems to me (without knowing much about your
    setup) that you would have odd reader behavior. On a 3 node system,
    if you
    add docs with node 1 and 2 but not 3 and your doing searches against
    all 3
    nodes, node 3 will have old readers opened until you add a doc to
    node 3.
    This is an odd consistency issue (node 1 and 2 have current views
    because
    you are adding docs to them, but node 3 will be stale until it gets a
    doc),
    but also if you keep adding docs to node 1 and 2, or just plain add
    no docs
    to node 3, won't node 3's reader's index files be pulled out from
    under it
    after 10 minutes? Node 3 (or 1 and 2 for that matter) will not give
    up its
    cached readers *until* you add a doc with that particular node.

    Perhaps I am all wet on this (I havn't used NFS with Lucene), but I
    think
    you may need to somehow coordinate the delete policy with the
    LuceneIndexAccessor on each node.

    This may be unrelated to your problem,and perhaps you get around the
    issue
    somehow, but just to throw it out there...

    - Mark
    On 6/29/07, Patrick Kimber wrote:



    I am using the Lucene Index Accessor contribution to co-ordinate the
    readers and writers:
    http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Jun 30, 2007 at 12:14 am

    Mark Miller wrote:

    You might try just using one of the nodes as
    the writer. In Michaels comments, he always seems
    to mention the pattern of one writer many
    readers on nfs. In this case you could use
    no LockFactory and perhaps gain a little speed there.
    One thing I would worry about if multiple write nodes
    is system clocks. Note that ExpirationTimeDeletionPolicy
    is based on file.lastModified().

    I am not sure here: do different nodes see the same
    file.lastModified() over NFS? I would assume they do
    up to the nodes system clocks differences + some
    NFS delays, in which case if clocks are at most seconds
    apart I think we should be okay with that 10 minutes
    expiration time.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Jun 29, 2007 at 11:47 pm

    Patrick Kimber wrote:

    As requested, I have been trying to improve the
    logging in the application so I can give you more
    details of the update pattern.

    I am using the Lucene Index Accessor contribution
    to co-ordinate the readers and writers:
    http://www.nabble.com/Fwd%3A-Contribution%3A-
    LuceneIndexAccessor-t17416.html#a47049
    Never used the IndexAccessor patch, so I may be
    wrong in the following.
    If the close method, in the IndexAccessProvider, fails
    the exception is logged but not re-thrown:

    public void close(IndexReader reader) {
    if (reader != null) {
    try {
    reader.close();
    } catch (IOException e) {
    log.error("", e);
    }

    I have been checking the application log. Just before
    the time when the lock file errors occur I found this
    log entry:
    [11:28:59] [ERROR] IndexAccessProvider
    java.io.FileNotFoundException:
    /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/
    segments_h75 (No such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    So a random file is opened as you close the reader...?
    I conclude you use IndexReader to delete documents, right?
    If so, these IndexReaders are actually writers, and we
    should see them as such when discussing this.

    If this is the case, just to verify - you should use the
    IndexReader constructor that takes a DeletionPolicy param.
    Otherwise default deletion policy will be used, and
    old files would be removed "too soon".

    If you do use the correct IndexReader constructor, it
    might indicate that the 10-minutes-deletion-policy does
    not work as it should, like Mark suggested.
    - I guess the missing segments file could result
    in the lock file not being removed? Yes.
    - Is it safe to ignore this exception (probably not)?
    No, let's fix it... /;->
    - Why would the segments file be missing? Could this
    be connected to the NFS issues in some way?
    I would think so.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Jun 30, 2007 at 12:45 am

    Never used the IndexAccessor patch, so I may be
    wrong in the following.
    No, let's fix it... /;->
    Don't mean to wade in over my head here, but just to help out those that
    have not used LuceneIndexAccessor.

    I am fairly certain that using the LuceneIndexAccessor could easily
    create the FileNotFoundException on the segments file. I am a lot less
    clear on whether that would then cause a problem with the WriteLock.

    LuceneIndexAccessor manages Readers and Writers (And Searchers, and
    Directories, etc). It keeps track of how many Readers are out and
    ensures a single Writer. You must request and release Readers and
    Writers. All Readers are cached until you release a Writer. Upon
    releasing a Writer, LuceneIndexAccessor waits for all Readers to be
    returned and clears the cache, causing new Readers to be opened on the
    next request.

    This is certain to be a problem due to the unavailability of "delete on
    last close" semantics over NFS. If a certain node in the cluster has not
    released a writer (due to not being used to write to the index) in a
    long time, another node could trigger the deletion of the files that a
    Reader from the first Node was using. LuceneIndexAccessor runs
    independently on each node, and so is not providing coherent access
    across all nodes. The WriteLock is being to sync the Writer from each
    node and the Readers are not being coordinated at all...each Node counts
    on getting a Writer released to cause its cached Readers to be released
    and reopened (on first access).

    Without this problem solved, it would seem difficult to know the
    FileNotFoundException was caused by something else. What I don't know is
    if or how this would cause a WriteLock timeout. Perhaps there is more
    than one issue at hand.

    A simple way to test the LuceneIndexAccessor problem would be to
    implement a synchronized method that calls
    waitForReadersAndCloseCached() (this method would prob need more logic
    than just the simple call), and then call that method more often than
    every 10 minutes.


    - Mark

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at Jun 29, 2007 at 8:40 pm
    : We are sharing a Lucene index in a Linux cluster over an NFS share. We have
    : multiple servers reading and writing to the index.
    :
    : I am getting regular lock exceptions e.g.
    : Lock obtain timed out:
    : NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-n-write.lock

    Perhaps i'm missing something, but i thought NativeFSLock was not suitable
    for NFS? ... or is is this what "lockd" provides? (my NFS knowledge is
    very out of date)

    : - We are using kernel NFS and lockd is running.
    : - We are using a modified version of the ExpirationTimeDeletionPolicy





    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at Jun 29, 2007 at 8:42 pm
    : Perhaps i'm missing something, but i thought NativeFSLock was not suitable
    : for NFS? ... or is is this what "lockd" provides? (my NFS knowledge is
    : very out of date)

    Do'h!

    I just read the docs for NativeFSLockFactory and noticed the "For example,
    for NFS servers there sometimes must be a separate lockd process running,
    and other configuration may be required such as running the server in
    kernel mode." part again ... sorry for the noise.




    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 29, '07 at 9:01a
activeSep 7, '07 at 6:07p
posts34
users7
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase