FAQ
I am currently writing a Lucene application and having a huge headache with
concurrency.

My requirements are that each time a file is indexed a search on its path is
performed to see if an update (delete then re-index) is required. If a
document with the same path exists then an IndexReader deletes the doc and
then a writer reindexes the fiel. Sadly due to requirements the deletes and
indexes can not be batch performed and I am constantly opening and closing
the IndexReader and IndexWriter between multiple threads. Everything has
been working fine and seems thread safe apart from this:

If I index a test batch of 10 files and then once again a few minutes later
repeat the operation on the same files then all 10 are updated ok. However
when I perform the same test with more than about 10 files then my searches
fail to find about 25% of the already existing files and I end up with
duplicate entries in the index. I have managed to fix this by closing the
indexWriter every time an update search is performed but this has taken
performance to almost embarrasing levels! My understanding was that you
could search a Lucene index with an IndexSearcher while any write operations
are taking place? Is it possible that the search skips segments which are
currently being written to?
--
View this message in context: http://www.nabble.com/Searches-fail-while-indexwriter-is-open-tf3505182.html#a9789072
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Apr 2, 2007 at 1:32 pm
    Yes, you can search while index writes are taking place, but....

    When you open an index reader, it essentially takes a snapshot
    of the index and further modifications of the index are not visible to
    that searcher as long as it's open. You must close and re-open the
    reader (and associated searchers) to see your changes. How this
    interacts with index writer flushing is...er...well I don't exactly know,
    but this could well be an issue...

    I wonder if this is what you're seeing. In broad terms, the base
    question is how quickly you need to see changes in the index reflected
    in your search results.

    I suspect that the 10 file thing is a red herring, what do you have your
    indexwriter parameters set at? Especially maxbuffereddocs (which has
    a default value, perhaps not coincidentally, of 10).......

    Lucene 2.1 has an IndexWriter.flush() method that could help......

    Erick
    On 4/2/07, baronDodd wrote:


    I am currently writing a Lucene application and having a huge headache
    with
    concurrency.

    My requirements are that each time a file is indexed a search on its path
    is
    performed to see if an update (delete then re-index) is required. If a
    document with the same path exists then an IndexReader deletes the doc and
    then a writer reindexes the fiel. Sadly due to requirements the deletes
    and
    indexes can not be batch performed and I am constantly opening and closing
    the IndexReader and IndexWriter between multiple threads. Everything has
    been working fine and seems thread safe apart from this:

    If I index a test batch of 10 files and then once again a few minutes
    later
    repeat the operation on the same files then all 10 are updated ok. However
    when I perform the same test with more than about 10 files then my
    searches
    fail to find about 25% of the already existing files and I end up with
    duplicate entries in the index. I have managed to fix this by closing the
    indexWriter every time an update search is performed but this has taken
    performance to almost embarrasing levels! My understanding was that you
    could search a Lucene index with an IndexSearcher while any write
    operations
    are taking place? Is it possible that the search skips segments which are
    currently being written to?
    --
    View this message in context:
    http://www.nabble.com/Searches-fail-while-indexwriter-is-open-tf3505182.html#a9789072
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • baronDodd at Apr 2, 2007 at 2:57 pm
    Many thanks for your response, some good points which I had not thought of,
    but unfortunately the problem remains.

    To clarify my index sequence in pseudo-code is this:

    if( fileExists( filePath ) ){

    createIndexReader();
    delectDoc( docNumber );
    }
    createIndexWriter();
    indexDoc();

    I am indexing 30 files, and then repeating the same index job minutes later
    which should result in 30 updates. My createIndexReader and writer methods
    close the opposite modifier ok. My isExisting() method also creates a new
    IndexSearcher instance. The question regarding the need to see index
    modfications so quickly is not quite relevant as it is documents indexed
    minutes or hours ago durin the first index job which are not being found,
    simply because I have an indexWriter open at the time.

    Sample from the first job:

    Index Writer Open
    Failed to find: C:\CBIS_test\updatetest\7.txt
    7.txt written to index
    Failed to find: C:\CBIS_test\updatetest\5.txt
    5.txt written to index
    Totals : Indexed 30 Updated: 0 (content analyzed: 30) files in 859
    milliseconds
    flushing
    Index Writer Open
    optimizing index

    And then the 2nd job:

    8.txt deleted
    Index Reader closed
    Index Writer Open
    8.txt written to index
    Failed to find: C:\CBIS_test\updatetest\7.txt
    7.txt written to index
    Index Writer closed
    Totals : Indexed 10 Updated: 20 (content analyzed: 30) files in 3606
    milliseconds
    flushing
    Index Writer Open
    optimizing index

    I may be missing something obvious, the maxbuffereddocs value made no
    difference when I changed it.



    Erick Erickson wrote:
    Yes, you can search while index writes are taking place, but....

    When you open an index reader, it essentially takes a snapshot
    of the index and further modifications of the index are not visible to
    that searcher as long as it's open. You must close and re-open the
    reader (and associated searchers) to see your changes. How this
    interacts with index writer flushing is...er...well I don't exactly know,
    but this could well be an issue...

    I wonder if this is what you're seeing. In broad terms, the base
    question is how quickly you need to see changes in the index reflected
    in your search results.

    I suspect that the 10 file thing is a red herring, what do you have your
    indexwriter parameters set at? Especially maxbuffereddocs (which has
    a default value, perhaps not coincidentally, of 10).......

    Lucene 2.1 has an IndexWriter.flush() method that could help......

    Erick
    On 4/2/07, baronDodd wrote:


    I am currently writing a Lucene application and having a huge headache
    with
    concurrency.

    My requirements are that each time a file is indexed a search on its path
    is
    performed to see if an update (delete then re-index) is required. If a
    document with the same path exists then an IndexReader deletes the doc
    and
    then a writer reindexes the fiel. Sadly due to requirements the deletes
    and
    indexes can not be batch performed and I am constantly opening and
    closing
    the IndexReader and IndexWriter between multiple threads. Everything has
    been working fine and seems thread safe apart from this:

    If I index a test batch of 10 files and then once again a few minutes
    later
    repeat the operation on the same files then all 10 are updated ok.
    However
    when I perform the same test with more than about 10 files then my
    searches
    fail to find about 25% of the already existing files and I end up with
    duplicate entries in the index. I have managed to fix this by closing the
    indexWriter every time an update search is performed but this has taken
    performance to almost embarrasing levels! My understanding was that you
    could search a Lucene index with an IndexSearcher while any write
    operations
    are taking place? Is it possible that the search skips segments which are
    currently being written to?
    --
    View this message in context:
    http://www.nabble.com/Searches-fail-while-indexwriter-is-open-tf3505182.html#a9789072
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Searches-fail-while-indexwriter-is-open-tf3505182.html#a9792429
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 2, '07 at 11:38a
activeApr 2, '07 at 2:57p
posts3
users2
websitelucene.apache.org

2 users in discussion

baronDodd: 2 posts Erick Erickson: 1 post

People

Translate

site design / logo © 2022 Grokbase