FAQ
Hi,

is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
better if memory usage is the primary concern ?

Our filters are sparse. So is SortedVIntList better in that case?

Are there any other compressed bitmap index implementations which offer bit
map compression at a decent performance assuming filters are sparse?

I'd appreciate any help on this.Thanks.

- Raavan

Search Discussions

  • Federico Fissore at Jan 7, 2011 at 11:12 pm

    First Last, il 07/01/2011 20:55, ha scritto:
    Hi,

    is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
    better if memory usage is the primary concern ?
    SortedVIntList is compressed, OpenBitSet is not

    Our filters are sparse. So is SortedVIntList better in that case?
    Yes

    Are there any other compressed bitmap index implementations which offer bit
    map compression at a decent performance assuming filters are sparse?
    I'm too looking for alternative implementations of compressed bitsets,
    so I'm too really interested in everybody experience: my primary concern
    at the moment is serializing bitsets to recover searcher warmup time

    I've tried some and roughly tested them: my conclusion was that we
    (lucene users) already stand on the rolls royce of bitset implementations.

    federico

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ryan Aylward at Jan 8, 2011 at 12:05 am
    I don't recall how we decided to use it, but we are using http://code.google.com/p/compressedbitset/ and it seems to be pretty efficient in terms of memory.

    -----Original Message-----
    From: Federico Fissore
    Sent: Friday, January 07, 2011 3:12 PM
    To: java-user@lucene.apache.org
    Subject: Re: is OpenBitSet / SortedVIntList compressed bit map index?

    First Last, il 07/01/2011 20:55, ha scritto:
    Hi,

    is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
    better if memory usage is the primary concern ?
    SortedVIntList is compressed, OpenBitSet is not

    Our filters are sparse. So is SortedVIntList better in that case?
    Yes

    Are there any other compressed bitmap index implementations which offer bit
    map compression at a decent performance assuming filters are sparse?
    I'm too looking for alternative implementations of compressed bitsets,
    so I'm too really interested in everybody experience: my primary concern
    at the moment is serializing bitsets to recover searcher warmup time

    I've tried some and roughly tested them: my conclusion was that we
    (lucene users) already stand on the rolls royce of bitset implementations.

    federico

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Raavan at Jan 8, 2011 at 6:18 pm
    Thanks Ryan. I will test this to see if it uses much less memory than
    SortedVIntList.

    -Raavan
    On Fri, Jan 7, 2011 at 4:05 PM, Ryan Aylward wrote:

    I don't recall how we decided to use it, but we are using
    http://code.google.com/p/compressedbitset/ and it seems to be pretty
    efficient in terms of memory.

    -----Original Message-----
    From: Federico Fissore
    Sent: Friday, January 07, 2011 3:12 PM
    To: java-user@lucene.apache.org
    Subject: Re: is OpenBitSet / SortedVIntList compressed bit map index?

    First Last, il 07/01/2011 20:55, ha scritto:
    Hi,

    is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
    better if memory usage is the primary concern ?
    SortedVIntList is compressed, OpenBitSet is not

    Our filters are sparse. So is SortedVIntList better in that case?
    Yes

    Are there any other compressed bitmap index implementations which offer bit
    map compression at a decent performance assuming filters are sparse?
    I'm too looking for alternative implementations of compressed bitsets,
    so I'm too really interested in everybody experience: my primary concern
    at the moment is serializing bitsets to recover searcher warmup time

    I've tried some and roughly tested them: my conclusion was that we
    (lucene users) already stand on the rolls royce of bitset implementations.

    federico

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Federico Fissore at Feb 11, 2011 at 2:56 pm

    Raavan, il 08/01/2011 19:17, ha scritto:
    Thanks Ryan. I will test this to see if it uses much less memory than
    SortedVIntList.

    -Raavan

    Hello Raavan,

    have you tested those implementations? How do they look?

    federico

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Raavan at Jan 8, 2011 at 6:14 pm
    Thanks Federico.
    my primary concern at the moment is serializing bitsets to recover
    searcher warmup time

    I am also considering doing the same to reduce warmup time during restarts.

    It seems one of the disadvantages of SortedVIntList is the performance
    skipTo() as per Paul Elschot since it does not support random access like
    OpenBitSet.
    https://issues.apache.org/jira/browse/LUCENE-1296

    Our primary concern is memory usage since we have hundreds of filters and
    large number of documents. So if the performance is decent, I am thinking of
    using SortedVIntList for all our sparse filters.

    -Raavan
    On Fri, Jan 7, 2011 at 3:11 PM, Federico Fissore wrote:

    First Last, il 07/01/2011 20:55, ha scritto:

    Hi,
    is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
    better if memory usage is the primary concern ?
    SortedVIntList is compressed, OpenBitSet is not



    Our filters are sparse. So is SortedVIntList better in that case?
    Yes



    Are there any other compressed bitmap index implementations which offer
    bit
    map compression at a decent performance assuming filters are sparse?
    I'm too looking for alternative implementations of compressed bitsets, so
    I'm too really interested in everybody experience: my primary concern at the
    moment is serializing bitsets to recover searcher warmup time

    I've tried some and roughly tested them: my conclusion was that we (lucene
    users) already stand on the rolls royce of bitset implementations.

    federico

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Raavan at Jan 8, 2011 at 6:25 pm
    Also, just for my understanding, is SortedVIntList able to perform some
    operations such as AND/OR without decompression ?

    Some of the algorithms mentioned below claim to do that. But I understand
    that there are patent issues surrounding these algorithms.
    http://en.wikipedia.org/wiki/Bitmap_index

    -Raavan
    On Sat, Jan 8, 2011 at 10:14 AM, Raavan wrote:

    Thanks Federico.
    my primary concern at the moment is serializing bitsets to recover
    searcher warmup time

    I am also considering doing the same to reduce warmup time during restarts.

    It seems one of the disadvantages of SortedVIntList is the performance
    skipTo() as per Paul Elschot since it does not support random access like
    OpenBitSet.
    https://issues.apache.org/jira/browse/LUCENE-1296

    Our primary concern is memory usage since we have hundreds of filters and
    large number of documents. So if the performance is decent, I am thinking of
    using SortedVIntList for all our sparse filters.

    -Raavan

    On Fri, Jan 7, 2011 at 3:11 PM, Federico Fissore wrote:

    First Last, il 07/01/2011 20:55, ha scritto:

    Hi,
    is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
    better if memory usage is the primary concern ?
    SortedVIntList is compressed, OpenBitSet is not



    Our filters are sparse. So is SortedVIntList better in that case?
    Yes



    Are there any other compressed bitmap index implementations which offer
    bit
    map compression at a decent performance assuming filters are sparse?
    I'm too looking for alternative implementations of compressed bitsets, so
    I'm too really interested in everybody experience: my primary concern at the
    moment is serializing bitsets to recover searcher warmup time

    I've tried some and roughly tested them: my conclusion was that we (lucene
    users) already stand on the rolls royce of bitset implementations.

    federico

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Raf at Jan 9, 2011 at 9:21 am

    On Sat, Jan 8, 2011 at 7:24 PM, Raavan wrote:

    Also, just for my understanding, is SortedVIntList able to perform some
    operations such as AND/OR without decompression ?
    No, not natively:
    http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/util/SortedVIntList.html

    But the *iterator* returns the *docIds* by order and there is a *constructor
    *that accepts a *DocIdSetIterator*.

    So it's quite easy to implement the AND/OR operations by creating a *
    DocIdSetIterator *that receives the two iterators from the original *
    SortedVIntLists* and scans them in parallel, implementing *nextDoc* and *
    advance* methods accordingly to AND/OR semantic.

    We use something like that and, for very sparse bitsets, it is more
    efficient than to convert them in *OpenBitSets* in order to perform AND/OR
    operations.

    Bye
    *Raf*
  • Raavan at Jan 10, 2011 at 10:00 am
    Thanks Raf.
    On Sun, Jan 9, 2011 at 1:20 AM, Raf wrote:
    On Sat, Jan 8, 2011 at 7:24 PM, Raavan wrote:

    Also, just for my understanding, is SortedVIntList able to perform some
    operations such as AND/OR without decompression ?
    No, not natively:

    http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/util/SortedVIntList.html

    But the *iterator* returns the *docIds* by order and there is a
    *constructor
    *that accepts a *DocIdSetIterator*.

    So it's quite easy to implement the AND/OR operations by creating a *
    DocIdSetIterator *that receives the two iterators from the original *
    SortedVIntLists* and scans them in parallel, implementing *nextDoc* and *
    advance* methods accordingly to AND/OR semantic.

    We use something like that and, for very sparse bitsets, it is more
    efficient than to convert them in *OpenBitSets* in order to perform AND/OR
    operations.

    Bye
    *Raf*
  • Ai114 at May 25, 2011 at 2:28 pm

    First Last wrote:

    Are there any other compressed bitmap index implementations which offer
    bit
    map compression at a decent performance assuming filters are sparse?
    Have a look at EWAH by Daniel Lemire
    google: http://code.google.com/p/javaewah/
    http://code.google.com/p/javaewah/
    research paper: http://arxiv.org/abs/0901.3751
    http://arxiv.org/abs/0901.3751
    code: https://github.com/lemire/javaewah/tree/
    https://github.com/lemire/javaewah/tree/

    Gabriel

    --
    View this message in context: http://lucene.472066.n3.nabble.com/is-OpenBitSet-SortedVIntList-compressed-bit-map-index-tp2213863p2983908.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 7, '11 at 7:55p
activeMay 25, '11 at 2:28p
posts10
users5
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase