[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661829#action_12661829 ]

Marvin Humphrey commented on LUCENE-1476:

Jason Rutherglen:
For realtime search where a new transaction may only have a handful of
deletes the tombstones may not be optimal
The whole tombstone idea arose out of the need for (close to) realtime search! It's intended to improve write speed.

When you make deletes with the BitSet model, you have to rewrite files that scale with segment size, regardless of how few deletions you make. Deletion of a single document in a large segment may necessitate writing out a substantial bit vector file.

In contrast, i/o throughput for writing out a tombstone file scales with the number of tombstones.
because too many tombstones would accumulate (I believe).
Say that you make a string of commits that are nothing but deleting a single document -- thus adding a new segment each time that contains nothing but a single tombstone. Those are going to be cheap to merge, so it seems unlikely that we'll end up with an unwieldy number of tombstone streams to interleave at search-time.

The more likely problem is the one McCandless articulated regarding a large segment accumulating a lot of tombstone streams against it. But I agree with him that it only gets truly serious if your merge policy neglects such segments and allows them to deteriorate for too long.
For this scenario rolling bitsets may be better. Meaning pool bit sets and
throw away unused readers.
I don't think I understand. Is this the "combination index reader/writer" model, where the writer prepares a data structure that then gets handed off to the reader?
BitVector implement DocIdSet

Key: LUCENE-1476
URL: https://issues.apache.org/jira/browse/LUCENE-1476
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 2.4
Reporter: Jason Rutherglen
Priority: Trivial
Attachments: LUCENE-1476.patch

Original Estimate: 12h
Remaining Estimate: 12h

BitVector can implement DocIdSet. This is for making SegmentReader.deletedDocs pluggable.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 91 | next ›
Discussion Overview
groupjava-dev @
postedJan 8, '09 at 12:01a
activeJan 30, '09 at 10:47p



site design / logo © 2021 Grokbase