On Wed, Jan 07, 2009 at 09:28:40PM -0600, robert engels wrote:
Why not just write the first byte as 0 for a bitsit, and 1 for a
sparse bit set (compressed), and make the determination when writing
based on the segment size and/or number of set bits.
Are you offering that as a solution to the problem I described here?
When you make deletes with the BitSet model, you have to rewrite
files that scale with segment size, regardless of how few deletions
you make. Deletion of a single document in a large segment may
necessitate writing out a substantial bit vector file.

In contrast, i/o throughput for writing out a tombstone file scales
with the number of tombstones.
Worst-case i/o costs don't improve under such a regime. You could still end
up writing a large, uncompressed bit vector file to accommodate a single

I suppose that has to be weighed against the search-time costs of interleaving
the tombstone streams. We can either pay the interleaving penalty at
index-time or search-time. It's annoying to write out a 1 MB uncompressed bit
vector file for a single deleted doc against an 8-million doc segment, but if
there are enough deletions to justify an uncompressed file, iterating through
them via merged-on-the-fly tombstone streams would be annoying too.

Marvin Humphrey

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 91 | next ›
Discussion Overview
groupjava-dev @
postedJan 8, '09 at 12:01a
activeJan 30, '09 at 10:47p



site design / logo © 2021 Grokbase