FAQ
I took the NFS issue up on the Linux NFS list. Here's the first
suggestion: hard links.

Thoughts?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Begin forwarded message:
From: Trond Myklebust <trond.myklebust at fys.uio.no>
Date: January 24, 2007 7:06:58 PM PST
To: Marvin Humphrey <marvin@rectangular.com>
Cc: nfs@lists.sourceforge.net
Subject: Re: [NFS] Lucene, delete-on-last-close, and flock emulation
On Wed, 2007-01-24 at 14:04 -0800, Marvin Humphrey wrote:
Greetings,

The Apache Lucene search engine library currently suffers from a
design flaw that causes problems when indexes located on NFS volumes
are updated. The same flaw afflicts my Perl/C port of Lucene,
KinoSearch. My goal is to eliminate the problem for both libraries.
I hope that someone subscribed to this list can help by clarifying
one item from the FAQ, and possibly offering further guidance.

Lucene depends upon delete-on-last-close semantics.

Lucene indexes are comprised of many files. Once written, no files
are ever modified -- instead, they are rendered obsolete when more
recent versions arrive. Which files are the most recent can be
determined by examining a base-36 numerical increment embedded in the
file name: "foo_z53" is more recent than "foo_z52".

When an index-reading app is created, it opens the most recent set of
files available, and never updates. If new files are written, the
reader won't know about them; it stays focused on the snapshot that
was present at its moment of creation.

Index-writing applications, once they have completed writing a set of
updated files, unlink any files which are now "obsolete". If a
reader still happens to be using one of these "obsolete" files,
delete-on-last-close ordinarily prevents the reader from being cut
off from the needed resource.

On NFS, this design breaks, since NFS does not support delete-on-
last-
close. When the index-writing application deletes an "obsolete" file
which is still in use by a reader, the reader crashes with a Stale
NFS Filehandle exception.

You could easily fix this by having the reader create a hard link
to the
index file. e.g.

ln foo foo-client.my.org-$$
open("foo-client.my.org-$$");
....
read()
...
close()
rm foo-client.my.org-$$
Lucene does not currently exploit advisory read-locking at all. One
possible solution to this problem is to have readers secure advisory
locks against the files they need, and for index-writing applications
to spare files when such locks are detected. Unfortunately, it is
very difficult for library code to enforce the level of discipline
needed by fcntl() locks. flock() would work much better.
Why do read locking at all?
I read in the manpage for flock(),

flock(2) does not lock files over NFS. Use fcntl(2) instead:
that does work over NFS, given a sufficiently recent version
of Linux and a server which supports locking.

However, apparently this is no longer accurate as of 2.6.12,
according to both the FAQ and this October 2005 post from Trond
Myklebust: <http://sourceforge.net/mailarchive/message.php?
msg_id=17217586>.

What I am confused about is how faithful the flock() emulation is.
The FAQ states:

On local Linux filesystems, POSIX locks and BSD locks are
invisible to one another. Thus, due to this emulation,
applications running on a Linux NFS server will still see
files locked by NFS clients as being locked with a fcntl()/POSIX
lock, whether the application on the client is using a BSD-style
or a POSIX-style lock. If the server application uses flock()BSD
locks, it will not see the locks the NFS clients use.

This says to me that we must always check for both fcntl and flock
locks before zapping a file. However, I am worried that if an
application opens a file and checks for the existence of an fcntl
lock, it may force an inappropriate lock release if a lock is held
elsewhere within the process. Is that a possibility? Or is the
flock
() emulator applying fcntl() locks against some symbolic stand-in?
And if it is locking a stand-in, are there circumstances under which
it is outright impossible for an application to figure out whether
someone, somewhere has secured a lock against a file on an NFS
volume?

I recognize that even in a best-case scenario, flock() over NFS will
only help us with recent systems, which isn't ideal. Fortunately,
the penalty for lock failure is only a crashed searching application,
rather than index corruption (Lucene uses a dot-lock mechanism for
serializing writer access, which I take should be bulletproof on NFS
after kernel 2.6.5 made O_EXCL creates atomic). However, if an
alternative design occurs to you, I'm all ears.

Lastly, we have been scratching our heads as to how we might detect
at index-creation time that the user has specified that the index be
located on an NFS volume. We would like to warn the user in such a
case. A couple hacks have been proposed, but they are decidedly non-
portable. If someone can suggest a algorithm that will determine
whether we can count on delete-on-last-close by failing reliably on
an NFS volume, we would be grateful.

Cheers,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
----
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to
share your
opinions on IT & business topics through brief surveys - and earn
cash
http://www.techsay.com/default.php?
page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Doron Cohen at Jan 25, 2007 at 7:33 am

    You could easily fix this by having the reader create a hard link
    to the
    index file. e.g.

    ln foo foo-client.my.org-$$
    open("foo-client.my.org-$$");
    ....
    read()
    ...
    close()
    rm foo-client.my.org-$$
    What happens when a client exits prematurely (without removing the hard
    link)?


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Marvin Humphrey at Jan 25, 2007 at 2:17 pm

    On Jan 24, 2007, at 11:31 PM, Doron Cohen wrote:

    You could easily fix this by having the reader create a hard link
    to the
    index file. e.g.

    ln foo foo-client.my.org-$$
    open("foo-client.my.org-$$");
    ....
    read()
    ...
    close()
    rm foo-client.my.org-$$
    What happens when a client exits prematurely (without removing the
    hard
    link)?
    The implication of the above is that the pid ($$) is included in the
    hard link filename. We can go farther and put add a hostId and an
    increment, too. That way, if a reader reconnects, it can scan file
    names for matching hostIds, and if zap them if their pids aren't
    active. If the host never reconnects, you get dead files, though.
    Manageable I think, but complex, fragile, and icky. I don't think
    the fellow who suggested this grokked how many files are involved or
    how often we have to perform open/close ops.

    Another drawback of this scheme is that it requires that the reader
    app possess write permissions on the index directory, or at least the
    NFS volume. One limitation of hard links is that you can't create
    one pointing to an NFS file on your local file system.

    Another problem: how would you do this sort of thing from Java? I
    googled '"hard link" Java' and didn't come up with anything. I found
    one forum entry where it was asserted that Java wouldn't support this
    kind of system-specific IO op out of principle. Somebody school me
    on how you'd implement it, please. From C, the POSIX function link()
    is available from <unistd.h> -- would you have to call out to that?
    Using a sequence of shell commands as per the example seems less than
    ideal. :)

    On the plus side, this is exactly the right remedy in theory. delete-
    on-last-close is effectively a reference counting scheme. So is
    this. I'm just waiting for the right variant of it to present itself.

    Marvin Humphrey
    Rectangular Research
    http://www.rectangular.com/



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedJan 25, '07 at 3:43a
activeJan 25, '07 at 2:17p
posts3
users2
websitelucene.apache.org

2 users in discussion

Marvin Humphrey: 2 posts Doron Cohen: 1 post

People

Translate

site design / logo © 2021 Grokbase