FAQ
Hi:

I did some performance analysis for different ways of doing numeric
ranging with lucene. Thought I'd share:

http://invertedindex.blogspot.com/2009/11/numeric-range-queries-comparison.html

-John

Search Discussions

  • Yonik Seeley at Nov 16, 2009 at 6:55 am

    On Mon, Nov 16, 2009 at 1:02 AM, John Wang wrote:
    I did some performance analysis for different ways of doing numeric
    ranging with lucene. Thought I'd share:
    FYI, the second approach is already implemented in both Lucene and Solr.
    http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldCacheRangeFilter.html

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Nov 16, 2009 at 7:02 am
    I wanted to say the same, like Yonik... One addition, the FieldCache only
    supports one value/doc and the second approach is slower, when deleted docs
    are involved and 0 is inside the range (need to consult TermDocs).

    By the way, the numbers are similar to mine from the FCRF issue and the
    explaination for 0-inside-range:
    https://issues.apache.org/jira/browse/LUCENE-1461

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: yseeley@gmail.com On Behalf Of Yonik
    Seeley
    Sent: Monday, November 16, 2009 7:55 AM
    To: java-user@lucene.apache.org
    Subject: Re: share some numbers for range queries
    On Mon, Nov 16, 2009 at 1:02 AM, John Wang wrote:
    I did some performance analysis for different ways of doing numeric
    ranging with lucene. Thought I'd share:
    FYI, the second approach is already implemented in both Lucene and Solr.
    http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Fiel
    dCacheRangeFilter.html

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Nov 16, 2009 at 7:49 am

    On Sun, Nov 15, 2009 at 11:02 PM, Uwe Schindler wrote:


    the second approach is slower, when deleted docs
    are involved and 0 is inside the range (need to consult TermDocs).
    This is a good point (and should be mentioned in your blog, John) - for
    while
    custom FieldCache-like implementations (ie bobo-browse, which additionally
    isn't
    restricted to single-valued fields) need not have this deficiency, for they
    can choose to map
    empty values to MAX_INT or something like that, the FCRF in its raw form
    can really bite you, performance-wise, if you didn't notice that sometimes
    your queries ran across zero and there were lots of deletes.

    -jake
  • Uwe Schindler at Nov 16, 2009 at 9:08 am
    From: Jake Mannix
    On Sun, Nov 15, 2009 at 11:02 PM, Uwe Schindler wrote:

    the second approach is slower, when deleted docs
    are involved and 0 is inside the range (need to consult TermDocs).
    This is a good point (and should be mentioned in your blog, John) - for
    while
    custom FieldCache-like implementations (ie bobo-browse, which additionally
    isn't
    restricted to single-valued fields) need not have this deficiency, for
    they
    can choose to map
    empty values to MAX_INT or something like that, the FCRF in its raw form
    can really bite you, performance-wise, if you didn't notice that sometimes
    your queries ran across zero and there were lots of deletes.
    I think both possibilities have their right to stay: FCRF is very, very
    fast, if you have no deletions and 0 is not included in your range and you
    have exactly one value per document (for the Lucene defaults). If you have
    zero value documents, you have to index marker values instead of leaving the
    field empty (e.g. Float.NaN for floats, which never hit a range). As soon as
    you use other FieldCache impls using multi-doc values (like bobo), I think,
    that it will not get really faster than NumericRangeQuery (additional work
    to iterate over terms for each doc, more comparisons,...), it may get
    slower. Not to forget is the possibly large overhead of populating the
    FieldCache, if you do no sorting.

    For easy use, NumericRangeQuery is preferable for all users, even if it is a
    little bit slower, it is only faster for very optimized cases (no
    deletions,...). Anybody should think about the pros and cons and not only
    look on performance.

    You can improve the speed of FieldCache populating, too. If you also use
    NumericField instead (with precStep=MAX_VALUE), the parsing is faster:
    Integer.parseInt() which is more complex than
    NumericUtils.prefixCodedToInt().

    For string term ranges, FieldCacheRangeFilter is much more faster, because
    it uses StringIndex cache.

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 16, '09 at 6:03a
activeNov 16, '09 at 9:08a
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase