My application for Lucene involves updating an existing index with a
mixture of new and revised documents. From what I've been able to
dicern from reading I'm going to have to delete the old versions of the
revised documents before indexing them again. Since this indexing will
probably take quite a while due to the number of new/revised documents
I'll be adding and the large number of documents already in the index,
I'm uncomfortable keeping an IndexReader and an IndexWriter open for
long periods of time.
What I'm considering doing is reading the file with mulitple documents
twice. One time I test to see if the document is in the index and
delete it if it is with something like:
The "Reference" term is unique.
...
while(String ref = getNextDocument() != null) {
Term t = Term("Reference",ref);
TermDocs td = indexReader.termDocs(t);
if(td != null) {
td.next();
indexReader.delete(td.doc());
}
}
Or should I not bother to look for the term at all and do something like
this?
while(String ref = getNextDocument() != null) {
Term t = Term("Reference",ref);
indexReader.delete(t);
}
Are either of these more efficient?
Then I would close the indexReader and go back and reread the file,
indexing merrily away.
Should I be concerned about keeping both an indexReader and indexWriter
open at the same time? I'll have other processes probably making
searches during this time. I'm not concerned about the searches not
finding the data I'm currently adding, I'm more concerned about locking
those searches out.
A couple of valid assumptions. The reference term is unique in the
index and there will be only one in the input file.
Thanks,
Jim.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]