FAQ
Hi,

We have been using lucene for years and it serves us well.

Sometimes when we issue a query, we only what to know
how many hits it leads, not want any docs back. Is it possible
to completely avoid score calculation to get total count back?

I understand score calculation needs a loop for all matched
docs, can we avoid the loop, surely this is for performance. We
want to achieve getting total count at O(1), independent of the
number of Docs?

Thanks very much for helps, Lisheng

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Yonik Seeley at May 23, 2007 at 4:54 pm

    On 5/23/07, Zhang, Lisheng wrote:
    We have been using lucene for years and it serves us well.

    Sometimes when we issue a query, we only what to know
    how many hits it leads, not want any docs back. Is it possible
    to completely avoid score calculation to get total count back?

    I understand score calculation needs a loop for all matched
    docs, can we avoid the loop, surely this is for performance. We
    want to achieve getting total count at O(1), independent of the
    number of Docs?
    Calculating scores adds a low, fixed amount of overhead to the matching logic.
    The savings would most likely not be that large.

    For simple queries, it might be quickest to use TermDocs() to iterate
    over the docs matching terms yourself.

    Also, see Matcher in http://issues.apache.org/jira/browse/LUCENE-584

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ramana Jelda at May 24, 2007 at 7:39 am
    But I also see importance of ignoring score calculation.

    If you put it aside performance gain, is there any possibility to completely
    ignore scoring calculation?

    Jelda
    -----Original Message-----
    From: yseeley@gmail.com On Behalf
    Of Yonik Seeley
    Sent: Wednesday, May 23, 2007 6:54 PM
    To: java-user@lucene.apache.org
    Subject: Re: How to avoid score calculation completely?
    On 5/23/07, Zhang, Lisheng wrote:
    We have been using lucene for years and it serves us well.

    Sometimes when we issue a query, we only what to know how
    many hits it
    leads, not want any docs back. Is it possible to completely avoid
    score calculation to get total count back?

    I understand score calculation needs a loop for all matched docs, can
    we avoid the loop, surely this is for performance. We want
    to achieve
    getting total count at O(1), independent of the number of Docs?
    Calculating scores adds a low, fixed amount of overhead to
    the matching logic.
    The savings would most likely not be that large.

    For simple queries, it might be quickest to use TermDocs() to
    iterate over the docs matching terms yourself.

    Also, see Matcher in http://issues.apache.org/jira/browse/LUCENE-584

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at May 24, 2007 at 3:19 pm

    On 5/24/07, Ramana Jelda wrote:
    But I also see importance of ignoring score calculation.

    If you put it aside performance gain, is there any possibility to completely
    ignore scoring calculation?
    Yes, for unsorted results use a hit collector and no sorting will be
    done by score (or anything else).

    You can also ignore the score by simply sorting on other fields.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 24, 2007 at 3:38 pm

    "Yonik Seeley" wrote:
    On 5/24/07, Ramana Jelda wrote:
    But I also see importance of ignoring score calculation.

    If you put it aside performance gain, is there any possibility to completely
    ignore scoring calculation?
    Yes, for unsorted results use a hit collector and no sorting will be
    done by score (or anything else).

    You can also ignore the score by simply sorting on other fields.
    I *think* something close to this would allow you to count the number
    of docs matching a query without scoring:

    Scorer s = query.weight(searcher).scorer(reader);
    int count = 0;
    while(s.next()) {
    count++;
    }

    I'm not certain that avoids all scoring work but at least for some of
    the scorers it should save some CPU time; I'm not sure how much. Also
    note that the TopDocCollector (used by default if you don't provide
    your own collector) does not count docs that have score <= 0.0, so the
    above code fragment would overcount in such cases.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Zhang, Lisheng at May 24, 2007 at 4:26 pm
    Hi, Thanks for helps!

    Yes, along the line you mentioned we can reduce the amount
    of calculation, but we still need to loop through to count
    all docs, so time may still be O(n), I am wondering if we
    can avoid the loop to get count directly?

    Best regards, Lisheng

    -----Original Message-----
    From: Michael McCandless
    Sent: Thursday, May 24, 2007 7:38 AM
    To: java-user@lucene.apache.org
    Subject: Re: How to avoid score calculation completely?



    "Yonik Seeley" wrote:
    On 5/24/07, Ramana Jelda wrote:
    But I also see importance of ignoring score calculation.

    If you put it aside performance gain, is there any possibility to completely
    ignore scoring calculation?
    Yes, for unsorted results use a hit collector and no sorting will be
    done by score (or anything else).

    You can also ignore the score by simply sorting on other fields.
    I *think* something close to this would allow you to count the number
    of docs matching a query without scoring:

    Scorer s = query.weight(searcher).scorer(reader);
    int count = 0;
    while(s.next()) {
    count++;
    }

    I'm not certain that avoids all scoring work but at least for some of
    the scorers it should save some CPU time; I'm not sure how much. Also
    note that the TopDocCollector (used by default if you don't provide
    your own collector) does not count docs that have score <= 0.0, so the
    above code fragment would overcount in such cases.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 24, 2007 at 4:41 pm

    "Zhang, Lisheng" wrote:
    Hi, Thanks for helps!

    Yes, along the line you mentioned we can reduce the amount
    of calculation, but we still need to loop through to count
    all docs, so time may still be O(n), I am wondering if we
    can avoid the loop to get count directly?
    I don't think you can get less than O(n) unless the query
    is a single term. If it is a single term you can just call
    IndexReader.docFreq(...) to get the number of docs that
    have that term, but, this call will not take into account
    deleted documents.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 23, '07 at 4:46p
activeMay 24, '07 at 4:41p
posts7
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase