FAQ
Wouldn't these excluded/filtered documents skew the scores even though they
are supposed to be marked as deleted? Don't the idf values used in scoring
depend on the entire document set and not just the matching hits for a
query?

Thanks,
TCK



On Tue, Mar 16, 2010 at 5:45 AM, Rene Hackl-Sommer wrote:

Hi Daniel,

Unless you have only a few documents and a small index, I don't think never
calling optimize is going to be a means you should rely upon.

What about if you reindexed the documents you are deleting, adding a field
<excludeFromSearch> with the value "true"? This would imply that either

1) all fields are stored, so you may retrieve them from the original doc
and add them to the new one plus the exclusion field
2) or if a lot of fields are only indexed you'd need access to the original
source. (With limitations it is also possible to reconstruct a field from
indexed data only, but not generally recommendable)

During search, just add "NOT excludeFromSearch:true" to the query.

If you need to keep track of which versions belong together, you may need
to think about how you uniquely identify documents, how this changes between
versions, and if the update dates might be of any help.

Cheers
Rene


Am 16.03.2010 05:20, schrieb Daniel Noll:

Hi all.
I'm trying to implement a form of document deletion where the previous
versions are kept around forever ( a primitive form of versioning) but
excluded from the search results.

I notice that after calling IndexWriter.deleteDocuments, even if you
close and reopen the index, the documents are still accessible using
document(int) but are returned from queries, which is exactly the
behaviour I want. However, if I call optimize() they will obviously
be obliterated.

My question is: as long as I never call optimize() -- will the deleted
documents hang around forever, or will a merge due to adding the new
documents eventually cause them to be removed?

If they will be removed then I need some other way to avoid them being
returned. I was thinking of actually *not* deleting them, but
maintaining a giant filter - I could store this filter on disk but
it's going to be pretty large even if I use a BitSet. :-( Is there
any other way to go about it?

Daniel





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 6 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedMar 16, '10 at 4:20a
activeMar 16, '10 at 11:07p
posts6
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase