Hi Paul,
not really an answer to your questions, I just thought you may find it useful as a confirmation that this packing of integers into (B or some other) Tree is good one.

I have seen Integer set distributions that can profit hugely from the tree organization on top.

have look at: http://www.iis.uni-stuttgart.de/intset/
not meant for on disk storage, but the idea is quite similar.


From: Paul Elschot <paul.elschot@xs4all.nl>
To: java-dev@lucene.apache.org
Sent: Sunday, 18 January, 2009 23:51:36
Subject: Re: Filesystem based bitset

On Friday 09 January 2009 22:30:14 Marvin Humphrey wrote:
On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote:

SSD is pretty close to RAM when it comes to seeking. Wouldn't that
mean that a bitset stored on an SSD would be more or less as fast as a
bitset in RAM?
Provided that your index can fit in the system i/o cache and stay there, you
get the speed of RAM regardless of the underlying permanent storage type.
There's no reason to wait on SSDs before implementing such a feature.
Since this started by thinking out loud, I'd like to continue doing that.
I've been thinking about how to add a decent skipTo() to something that
compresses better than an (Open)BitSet, and this turns out to be an
integer set implemented as a B plus tree (all leafs on the same level) of
only integers with key/data compression by a frame of reference for
every node (see LUCENE-1410).
I found a java implementation for a B plus tree on sourceforge: BpLusDotNet
in the BplusJ package, see http://bplusdotnet.sourceforge.net/ .
This has nice transaction semantics on a file system and it has a BSD licence,
so it could be used as a starting point, but:
- it only has strings as index values, so it will need quite some simplification
to work on integers as keys and data, and
- it has no built in compression as far as I could see on first inspection.
The questions:
Would someone know of a better starting point for a B plus tree of integers
with node compression?
For example, how close is the current lucene code base to implementing
a b plus tree for the doc ids of a single term?
How valuable are transaction semantics for such an integer set? It is
tempting to try and implement such an integer set by starting from the
ground up, but I don't have any practical programming experience with
transaction semantics, so it may be better to start from something that
has transactions right from the start.
Paul Elschot

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 20 of 22 | next ›
Discussion Overview
groupdev @
postedJan 9, '09 at 7:12p
activeJan 19, '09 at 5:33p



site design / logo © 2021 Grokbase