FAQ
Yes, and I don't think the "worst-case" is correct.

When you go to write that segment, you determine that it is a "large"
segment, but has few deletions (one in this case), it will be written
compressed in probably less than 10 bytes (1 byte header, vlong
start, vint length - you only write the ones...)...

You would not write the file as uncompressed in this case. The only
time you would write a uncompressed bitset is when you determine it
is large with many deletions, or it is very small (more efficient to
just write the bytes quickly). The very small case should probably
be less than the size "standard" disk block (which is probably 32k
these days, meaning 256k documents).
On Jan 7, 2009, at 10:28 PM, Marvin Humphrey wrote:
On Wed, Jan 07, 2009 at 09:28:40PM -0600, robert engels wrote:
Why not just write the first byte as 0 for a bitsit, and 1 for a
sparse bit set (compressed), and make the determination when writing
based on the segment size and/or number of set bits.
Are you offering that as a solution to the problem I described here?
When you make deletes with the BitSet model, you have to rewrite
files that scale with segment size, regardless of how few deletions
you make. Deletion of a single document in a large segment may
necessitate writing out a substantial bit vector file.

In contrast, i/o throughput for writing out a tombstone file scales
with the number of tombstones.
Worst-case i/o costs don't improve under such a regime. You could
still end
up writing a large, uncompressed bit vector file to accommodate a
single
deletion.

I suppose that has to be weighed against the search-time costs of
interleaving
the tombstone streams. We can either pay the interleaving penalty at
index-time or search-time. It's annoying to write out a 1 MB
uncompressed bit
vector file for a single deleted doc against an 8-million doc
segment, but if
there are enough deletions to justify an uncompressed file,
iterating through
them via merged-on-the-fly tombstone streams would be annoying too.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 91 | next ›
Discussion Overview
groupjava-dev @
categorieslucene
postedJan 8, '09 at 12:01a
activeJan 30, '09 at 10:47p
posts91
users6
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase