FAQ
Hi,

would this compressed and fast(?) bitset be interesting for solr/lucene
or is openbitset already done this way?
quoting from github:

The goal of word-aligned compression is not to
achieve the best compression, but rather to
improve query processing time.

License is GPL version 3 and ASL2.0.

http://code.google.com/p/javaewah
https://github.com/lemire/javaewah

I just saw it on twitter ...

Regards,
Peter.

--
http://jetwick.com twitter search prototype


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Uwe Schindler at Nov 5, 2010 at 3:47 pm
    Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
    which is synchronized, in the iterator code - which is the thing that is
    most important for DocIdSets.... Looks like stone ages.

    Else I would simply give it a try by rewriting the class to also implement
    DocIdSet and return the optimized iterator (not the one in this class). You
    can then try to replace some OpenBitSets in any filters and perf test?

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Peter Karich
    Sent: Friday, November 05, 2010 3:38 PM
    To: dev@lucene.apache.org
    Subject: fast bitset

    Hi,

    would this compressed and fast(?) bitset be interesting for solr/lucene or is
    openbitset already done this way?
    quoting from github:

    The goal of word-aligned compression is not to achieve the best
    compression,
    but rather to improve query processing time.

    License is GPL version 3 and ASL2.0.

    http://code.google.com/p/javaewah
    https://github.com/lemire/javaewah

    I just saw it on twitter ...

    Regards,
    Peter.

    --
    http://jetwick.com twitter search prototype


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Earwin Burrfoot at Nov 5, 2010 at 6:33 pm
    Important point about WAH and friends is their ability to be fast
    and/or/not/xor'ed without full decompression. And they're not
    random-access capable.
    On Fri, Nov 5, 2010 at 18:47, Uwe Schindler wrote:
    Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
    which is synchronized, in the iterator code - which is the thing that is
    most important for DocIdSets.... Looks like stone ages.

    Else I would simply give it a try by rewriting the class to also implement
    DocIdSet and return the optimized iterator (not the one in this class). You
    can then try to replace some OpenBitSets in any filters and perf test?

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Peter Karich
    Sent: Friday, November 05, 2010 3:38 PM
    To: dev@lucene.apache.org
    Subject: fast bitset

    Hi,

    would this compressed and fast(?) bitset be interesting for solr/lucene or is
    openbitset already done this way?
    quoting from github:

    The goal of word-aligned compression is not to achieve the best
    compression,
    but rather to improve query processing time.

    License is GPL version 3 and ASL2.0.

    http://code.google.com/p/javaewah
    https://github.com/lemire/javaewah

    I just saw it on twitter ...

    Regards,
    Peter.

    --
    http://jetwick.com twitter search prototype


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org


    --
    Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
    Phone: +7 (495) 683-567-4
    ICQ: 104465785

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Peter Karich at Nov 5, 2010 at 9:29 pm
    And they're not random-access capable.
    which means it isn't applicable?

    Important point about WAH and friends is their ability to be fast
    and/or/not/xor'ed without full decompression. And they're not
    random-access capable.

    On Fri, Nov 5, 2010 at 18:47, Uwe Schindlerwrote:
    Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
    which is synchronized, in the iterator code - which is the thing that is
    most important for DocIdSets.... Looks like stone ages.

    Else I would simply give it a try by rewriting the class to also implement
    DocIdSet and return the optimized iterator (not the one in this class). You
    can then try to replace some OpenBitSets in any filters and perf test?

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Peter Karich
    Sent: Friday, November 05, 2010 3:38 PM
    To: dev@lucene.apache.org
    Subject: fast bitset

    Hi,

    would this compressed and fast(?) bitset be interesting for solr/lucene or is
    openbitset already done this way?
    quoting from github:

    The goal of word-aligned compression is not to achieve the best
    compression,
    but rather to improve query processing time.

    License is GPL version 3 and ASL2.0.

    http://code.google.com/p/javaewah
    https://github.com/lemire/javaewah

    I just saw it on twitter ...

    Regards,
    Peter.

    --
    http://jetwick.com twitter search prototype

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Earwin Burrfoot at Nov 5, 2010 at 9:52 pm
    It's okay, trunk has iteration-based filters.

    Filters with low selectivity might be faster if used in oldstyle
    random-access way, though. If one wants to exploit this, compressed
    bitmaps are no go.
    On Sat, Nov 6, 2010 at 00:29, Peter Karich wrote:

    And they're not random-access capable.
    which means it isn't applicable?

    Important point about WAH and friends is their ability to be fast
    and/or/not/xor'ed without full decompression. And they're not
    random-access capable.

    On Fri, Nov 5, 2010 at 18:47, Uwe Schindlerwrote:
    Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
    which is synchronized, in the iterator code - which is the thing that is
    most important for DocIdSets.... Looks like stone ages.

    Else I would simply give it a try by rewriting the class to also
    implement
    DocIdSet and return the optimized iterator (not the one in this class).
    You
    can then try to replace some OpenBitSets in any filters and perf test?

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Peter Karich
    Sent: Friday, November 05, 2010 3:38 PM
    To: dev@lucene.apache.org
    Subject: fast bitset

    Hi,

    would this compressed and fast(?) bitset be interesting for solr/lucene
    or is
    openbitset already done this way?
    quoting from github:

    The goal of word-aligned compression is not to achieve the best
    compression,
    but rather to improve query processing time.

    License is GPL version 3 and ASL2.0.

    http://code.google.com/p/javaewah
    https://github.com/lemire/javaewah

    I just saw it on twitter ...

    Regards,
    Peter.

    --
    http://jetwick.com twitter search prototype

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org


    --
    Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
    Phone: +7 (495) 683-567-4
    ICQ: 104465785

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedNov 5, '10 at 2:37p
activeNov 5, '10 at 9:52p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase