FAQ
Hello,

I have a situation where I need to have multiple applications, potentially
located on different servers, and which have no knowledge of each other,
indexing into and searching from the same Lucene index. I anticipate
problems with locks.

Let's say I have two applications and, at any time, either of them may try
to index upwards of 1000 documents (or more!). If, by luck, these
applications do not attempt to write to the index at the same time then
things are fine. However, if both of them try to write to the index at the
same time, one of them will fail due to the index being locked.

My first solution to this problem was to have both applications check to see
if the index is locked and to let them sleep until the index was unlocked.
The problem with this is that if, while indexing, an application is shut
down or killed, the index may not be unlocked. This will block other
applications from indexing and may cause them to hang.

Clearly I have a threading problems. I think I may know a solution to this
problem and I would appreciate verification of the solution or suggestions
on approaches.

I am thinking that I can make all of the applications index into their own
index, not the central shared index. Their own index might be a FSDirectory
or a RAMDirectory. When done indexing, the applications' indexes would be
merged with the central index for consumption by all applications sharing
the index.
From what I understand, the process of merging indexes takes a lot less time
than the process of inserting into or deleting from an index. This seems to
mean that I'm less "likely" to run into locking issues. I can more safely
have process sleep until the index is unlocked and can gain access to merge
their index with the central index. If these applications use their own
FSDirectory I should be able to continue working with their FS directory in
the case of an unclean shutdowns and should still be able to merge it with
the central index.

Does anyone have any advice to offer on this?

Thank you,

Doug Hughes
dhughes@alagad.com

Search Discussions

  • Daniel Naber at Jun 8, 2005 at 6:32 pm

    On Tuesday 07 June 2005 19:36, Doug Hughes wrote:

    I am thinking that I can make all of the applications index into their
    own index, not the central shared index.  Their own index might be a
    FSDirectory or a RAMDirectory.  When done indexing, the applications'
    indexes would be merged with the central index for consumption by all
    applications sharing the index.
    Why not just work on more than one index without merging and search using
    MultiSearcher?

    Regards
    Daniel

    --
    http://www.danielnaber.de

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at Jun 9, 2005 at 8:28 pm
    I think your setup is right for a centralized IndexQueueManager that is
    subscribed to topics to which your distributed servers push data to
    index via JMS. That way you get an easy way to add more machines to
    the cluster, you get persistence of not-yet-indexed data, and you get a
    queuing mechanism that takes care of locking issues.

    Otis


    --- Doug Hughes wrote:
    Hello,

    I have a situation where I need to have multiple applications,
    potentially
    located on different servers, and which have no knowledge of each
    other,
    indexing into and searching from the same Lucene index. I anticipate
    problems with locks.

    Let's say I have two applications and, at any time, either of them
    may try
    to index upwards of 1000 documents (or more!). If, by luck, these
    applications do not attempt to write to the index at the same time
    then
    things are fine. However, if both of them try to write to the index
    at the
    same time, one of them will fail due to the index being locked.

    My first solution to this problem was to have both applications check
    to see
    if the index is locked and to let them sleep until the index was
    unlocked.
    The problem with this is that if, while indexing, an application is
    shut
    down or killed, the index may not be unlocked. This will block other
    applications from indexing and may cause them to hang.

    Clearly I have a threading problems. I think I may know a solution
    to this
    problem and I would appreciate verification of the solution or
    suggestions
    on approaches.

    I am thinking that I can make all of the applications index into
    their own
    index, not the central shared index. Their own index might be a
    FSDirectory
    or a RAMDirectory. When done indexing, the applications' indexes
    would be
    merged with the central index for consumption by all applications
    sharing
    the index.

    From what I understand, the process of merging indexes takes a lot
    less time
    than the process of inserting into or deleting from an index. This
    seems to
    mean that I'm less "likely" to run into locking issues. I can more
    safely
    have process sleep until the index is unlocked and can gain access to
    merge
    their index with the central index. If these applications use their
    own
    FSDirectory I should be able to continue working with their FS
    directory in
    the case of an unclean shutdowns and should still be able to merge it
    with
    the central index.

    Does anyone have any advice to offer on this?

    Thank you,

    Doug Hughes
    dhughes@alagad.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 7, '05 at 7:36p
activeJun 9, '05 at 8:28p
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase