The way we've simplified this that every document has an OID. It
simplifies updates and delete tracking (in the transaction log).
On Jan 8, 2009, at 2:28 PM, Marvin Humphrey (JIRA) wrote:

[ https://issues.apache.org/jira/browse/LUCENE-1476?
tabpanel&focusedCommentId=12662107#action_12662107 ]

Marvin Humphrey commented on LUCENE-1476:

Mike McCandless:
Commit is for crash recovery, and for knowing when it's OK to delete
prior commits. Simply writing the files (and not syncing them), and
perhaps giving IndexReader.open the SegmentInfos to use directly (and
not writing a segments_N via the filesystem) would allow us to search
added docs without paying the cost of sync'ing all the files.
Mmm. I think I might have given IndexWriter.commit() slightly
semantics. Specifically, I might have given it a boolean "sync"
which defaults to false.
Also: brand new, tiny segments should be written into a RAMDirectory
and then merged over time into the real Directory.
Two comments. First, if you don't sync, but rather leave it up to
the OS when
it wants to actually perform the actual disk i/o, how expensive is
flushing? Can
we make it cheap enough to meet Jason's absolute change rate

Second, the multi-index model is very tricky when dealing with
"updates". How
do you guarantee that you always see the "current" version of a given
document, and only that version? When do you expose new deletes in
RAMDirectory, when do you expose new deletes in the FSDirectory,
how do you
manage slow merges from the RAMDirectory to the FSDirectory, how do
you manage
new adds to the RAMDirectory that take place during slow merges...

Building a single-index, two-writer model that could handle fast
updates while
performing background merging was one of the main drivers behind
the tombstone

BitVector implement DocIdSet

Key: LUCENE-1476
URL: https://issues.apache.org/jira/browse/
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 2.4
Reporter: Jason Rutherglen
Priority: Trivial
Attachments: LUCENE-1476.patch

Original Estimate: 12h
Remaining Estimate: 12h

BitVector can implement DocIdSet. This is for making
SegmentReader.deletedDocs pluggable.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 47 of 109 | next ›
Discussion Overview
groupdev @
postedDec 3, '08 at 10:30p
activeJan 30, '09 at 10:47p



site design / logo © 2021 Grokbase