FAQ
My application for Lucene involves updating an existing index with a
mixture of new and revised documents. From what I've been able to
dicern from reading I'm going to have to delete the old versions of the
revised documents before indexing them again. Since this indexing will
probably take quite a while due to the number of new/revised documents
I'll be adding and the large number of documents already in the index,
I'm uncomfortable keeping an IndexReader and an IndexWriter open for
long periods of time.

What I'm considering doing is reading the file with mulitple documents
twice. One time I test to see if the document is in the index and
delete it if it is with something like:

The "Reference" term is unique.

...
while(String ref = getNextDocument() != null) {
Term t = Term("Reference",ref);
TermDocs td = indexReader.termDocs(t);
if(td != null) {
td.next();
indexReader.delete(td.doc());
}
}

Or should I not bother to look for the term at all and do something like
this?

while(String ref = getNextDocument() != null) {
Term t = Term("Reference",ref);
indexReader.delete(t);
}
Are either of these more efficient?

Then I would close the indexReader and go back and reread the file,
indexing merrily away.

Should I be concerned about keeping both an indexReader and indexWriter
open at the same time? I'll have other processes probably making
searches during this time. I'm not concerned about the searches not
finding the data I'm currently adding, I'm more concerned about locking
those searches out.

A couple of valid assumptions. The reference term is unique in the
index and there will be only one in the input file.

Thanks,
Jim.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Miles Barr at Jan 10, 2005 at 10:07 am

    On Fri, 2005-01-07 at 14:47 -0500, Jim Lynch wrote:
    My application for Lucene involves updating an existing index with a
    mixture of new and revised documents. From what I've been able to
    dicern from reading I'm going to have to delete the old versions of the
    revised documents before indexing them again. Since this indexing will
    probably take quite a while due to the number of new/revised documents
    I'll be adding and the large number of documents already in the index,
    I'm uncomfortable keeping an IndexReader and an IndexWriter open for
    long periods of time.
    As I understand it you can't have an index reader which you do deletes
    on and an index writer open at the same time since they are both doing
    write operations. I think locking will prevent you from opening an index
    writer once you do a delete on the reader.

    So you're either going to have to open and close the reader and writer
    for each update, or keep a list of duplicate references and a list of
    documents to be updated, then do the deletes like:

    for (Iterator it = toBeDeleted.iterator(); it.hasNext(); ) {
    Term term = new Term("Reference", (String) it.next());
    indexReader.delete(term);
    }

    Close the reader, open the writer, then iterate through your list of new
    docs and write them to the index.
    Should I be concerned about keeping both an indexReader and indexWriter
    open at the same time? I'll have other processes probably making
    searches during this time. I'm not concerned about the searches not
    finding the data I'm currently adding, I'm more concerned about locking
    those searches out.
    Once you close your reader searches won't be possible. So once you've
    done your deletes close the reader and open it again to release the
    write lock before opening the writer.


    --
    Miles Barr <[email protected]>
    Runtime Collective Ltd.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Jim Lynch at Jan 10, 2005 at 2:43 pm
    Miles,

    Thanks for the tips. I didn't see this response nor did I see my
    original email earlier, so I reposted the question, thinking I had
    forgotten to do so on Friday. My apologies to the group for the double
    post.

    Jim.

    Miles Barr wrote:
    On Fri, 2005-01-07 at 14:47 -0500, Jim Lynch wrote:

    My application for Lucene involves updating an existing index with a
    mixture of new and revised documents. From what I've been able to
    dicern from reading I'm going to have to delete the old versions of the
    revised documents before indexing them again. Since this indexing will
    probably take quite a while due to the number of new/revised documents
    I'll be adding and the large number of documents already in the index,
    I'm uncomfortable keeping an IndexReader and an IndexWriter open for
    long periods of time.
    As I understand it you can't have an index reader which you do deletes
    on and an index writer open at the same time since they are both doing
    write operations. I think locking will prevent you from opening an index
    writer once you do a delete on the reader.

    So you're either going to have to open and close the reader and writer
    for each update, or keep a list of duplicate references and a list of
    documents to be updated, then do the deletes like:

    for (Iterator it = toBeDeleted.iterator(); it.hasNext(); ) {
    Term term = new Term("Reference", (String) it.next());
    indexReader.delete(term);
    }

    Close the reader, open the writer, then iterate through your list of new
    docs and write them to the index.


    Should I be concerned about keeping both an indexReader and indexWriter
    open at the same time? I'll have other processes probably making
    searches during this time. I'm not concerned about the searches not
    finding the data I'm currently adding, I'm more concerned about locking
    those searches out.
    Once you close your reader searches won't be possible. So once you've
    done your deletes close the reader and open it again to release the
    write lock before opening the writer.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 7, '05 at 7:47p
activeJan 10, '05 at 2:43p
posts3
users2
websitelucene.apache.org

2 users in discussion

Jim Lynch: 2 posts Miles Barr: 1 post

People

Translate

site design / logo © 2023 Grokbase