FAQ
Hi,

When Lucene performs a Boolean query, say:

Field Name = Male
AND
Field Age = 30

assuming the resultant docs for each portion of the query were:

Matching docs for: Name = 1,2
Matching docs for: Age = 1,2,5,10

Will Lucene stop searching for documents matching the Age term once it
has found documents 1 and 2 ?
i.e. since 5 and 10 will not be used will it stop searching at document
number 2 ?

Thx,
Joel


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Michael McCandless at May 21, 2009 at 2:30 pm
    Yes.

    As soon as Lucene sees that the Name docID iteration has ended, the
    search will break.

    Mike
    On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
    Hi,

    When Lucene performs a Boolean query, say:

    Field Name = Male
    AND
    Field Age = 30

    assuming the resultant docs for each portion of the query were:

    Matching docs for:  Name = 1,2
    Matching docs for:  Age = 1,2,5,10

    Will Lucene stop searching for documents matching the Age term once it
    has found documents 1 and 2 ?
    i.e. since 5 and 10 will not be used will it stop searching at document
    number 2 ?

    Thx,
    Joel


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Joel Halbert at May 21, 2009 at 2:35 pm
    Thx. so, just to clarify, in the example I gave below...

    Lucene will search for documents matching on Name and find doc 1 and doc
    2.
    Then it will search age and find docs 1, 2 and then break. It will not
    go on to seek 5 and 10...?

    -----Original Message-----
    From: Michael McCandless <lucene@mikemccandless.com>
    Reply-To: java-user@lucene.apache.org
    To: java-user@lucene.apache.org
    Subject: Re: Does Lucene fail fast on boolean queries?
    Date: Thu, 21 May 2009 10:29:57 -0400

    Yes.

    As soon as Lucene sees that the Name docID iteration has ended, the
    search will break.

    Mike
    On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
    Hi,

    When Lucene performs a Boolean query, say:

    Field Name = Male
    AND
    Field Age = 30

    assuming the resultant docs for each portion of the query were:

    Matching docs for: Name = 1,2
    Matching docs for: Age = 1,2,5,10

    Will Lucene stop searching for documents matching the Age term once it
    has found documents 1 and 2 ?
    i.e. since 5 and 10 will not be used will it stop searching at document
    number 2 ?

    Thx,
    Joel


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 21, 2009 at 2:44 pm
    Well... scoring of AND queries currently is done doc-at-once.

    So Lucene will first step to doc 1 for Name, then ask age to skip to
    doc >= 1, will see that both have doc=1 and collect it. The same
    thing happens for doc=2. Then, Lucene will ask for the next doc of
    Name, which returns "false" (end of docs) and the loop breaks.

    OR, Lucene may drive the query in the opposite order (age and then
    Name), in which case it's the same through doc=2, but then Lucene asks
    age for the next doc, gets 5 back, then asks the Name iter to skip to
    doc >= 5, which returns false, and the loop breaks.

    So in fact "doc=5" can be asked for by Lucene.

    Also note that this is an internal implementation detail -- Lucene
    could easily change to do batch processing of AND'd queries in which
    case docs 5,10 could easily be iterated on. So I wouldn't "rely" on
    this in your app.

    Mike
    On Thu, May 21, 2009 at 10:34 AM, Joel Halbert wrote:
    Thx. so, just to clarify, in the example I gave below...

    Lucene will search for documents matching on Name and find doc 1 and doc
    2.
    Then it will search age and find docs 1, 2 and then break. It will not
    go on to seek 5 and 10...?

    -----Original Message-----
    From: Michael McCandless <lucene@mikemccandless.com>
    Reply-To: java-user@lucene.apache.org
    To: java-user@lucene.apache.org
    Subject: Re: Does Lucene fail fast on boolean queries?
    Date: Thu, 21 May 2009 10:29:57 -0400

    Yes.

    As soon as Lucene sees that the Name docID iteration has ended, the
    search will break.

    Mike
    On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
    Hi,

    When Lucene performs a Boolean query, say:

    Field Name = Male
    AND
    Field Age = 30

    assuming the resultant docs for each portion of the query were:

    Matching docs for:  Name = 1,2
    Matching docs for:  Age = 1,2,5,10

    Will Lucene stop searching for documents matching the Age term once it
    has found documents 1 and 2 ?
    i.e. since 5 and 10 will not be used will it stop searching at document
    number 2 ?

    Thx,
    Joel


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Joel Halbert at May 21, 2009 at 2:58 pm
    Thx. We're not relying on the internal implementation, but I was
    wondering with respect to how efficient it is with respect to doing a
    boolean AND query.

    i.e. does clause precedence effect the efficiency of the query - so is X
    && Y faster than Y && X if there are fewer hits for X. From how you
    describe it it is equally efficient, either way.

    In particular we are trying to work out whether a particular numerical
    RangeQuery that needs to be AND'd with a TermQuery is fastest as:

    BooleanQuery(RangeQuery && TermQuery)

    or as a TermQuery which then has it's results filtered by processing
    each in turn.



    -----Original Message-----
    From: Michael McCandless <lucene@mikemccandless.com>
    Reply-To: java-user@lucene.apache.org
    To: java-user@lucene.apache.org
    Subject: Re: Does Lucene fail fast on boolean queries?
    Date: Thu, 21 May 2009 10:43:46 -0400

    Well... scoring of AND queries currently is done doc-at-once.

    So Lucene will first step to doc 1 for Name, then ask age to skip to
    doc >= 1, will see that both have doc=1 and collect it. The same
    thing happens for doc=2. Then, Lucene will ask for the next doc of
    Name, which returns "false" (end of docs) and the loop breaks.

    OR, Lucene may drive the query in the opposite order (age and then
    Name), in which case it's the same through doc=2, but then Lucene asks
    age for the next doc, gets 5 back, then asks the Name iter to skip to
    doc >= 5, which returns false, and the loop breaks.

    So in fact "doc=5" can be asked for by Lucene.

    Also note that this is an internal implementation detail -- Lucene
    could easily change to do batch processing of AND'd queries in which
    case docs 5,10 could easily be iterated on. So I wouldn't "rely" on
    this in your app.

    Mike
    On Thu, May 21, 2009 at 10:34 AM, Joel Halbert wrote:
    Thx. so, just to clarify, in the example I gave below...

    Lucene will search for documents matching on Name and find doc 1 and doc
    2.
    Then it will search age and find docs 1, 2 and then break. It will not
    go on to seek 5 and 10...?

    -----Original Message-----
    From: Michael McCandless <lucene@mikemccandless.com>
    Reply-To: java-user@lucene.apache.org
    To: java-user@lucene.apache.org
    Subject: Re: Does Lucene fail fast on boolean queries?
    Date: Thu, 21 May 2009 10:29:57 -0400

    Yes.

    As soon as Lucene sees that the Name docID iteration has ended, the
    search will break.

    Mike
    On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
    Hi,

    When Lucene performs a Boolean query, say:

    Field Name = Male
    AND
    Field Age = 30

    assuming the resultant docs for each portion of the query were:

    Matching docs for: Name = 1,2
    Matching docs for: Age = 1,2,5,10

    Will Lucene stop searching for documents matching the Age term once it
    has found documents 1 and 2 ?
    i.e. since 5 and 10 will not be used will it stop searching at document
    number 2 ?

    Thx,
    Joel


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at May 21, 2009 at 3:39 pm

    On Thu, May 21, 2009 at 10:58 AM, Joel Halbert wrote:
    Thx.  We're not relying on the internal implementation, but I was
    wondering with respect to how efficient it is with respect to doing a
    boolean AND query.

    i.e. does clause precedence effect the efficiency of the query - so is X
    && Y faster than Y && X if there are fewer hits for X. From how you
    describe it it is equally efficient, either way.
    Lucene has a simple heuristic (in ConjunctionScorer): it first calls
    next() on each of X and Y, then it looks at that first docID for each
    and re-orders the clauses so that whichever of X or Y had the largest
    first docID is the one that "drives" the intersection. But that
    heuristic could easily be wrong if it just so happened that a rare
    term appears in an early document. A simple improvement would be to
    add an approxCount() to Query or Scorer/DocIdSet and then order based
    on that. But we haven't done so yet...
    In particular we are trying to work out whether a particular numerical
    RangeQuery that needs to be AND'd with a TermQuery is fastest as:

    BooleanQuery(RangeQuery && TermQuery)

    or as a TermQuery which then has it's results filtered by processing
    each in turn.
    If the RangeQuery will hit lots of terms and/or docs, you're likely
    better off building the filter; else, the query. Test both and let
    the machine tell you :)

    But: orthogonal to that decision, you should look @ TrieRangeQuery (in
    2.9-dev, contrib/queries) which does numeric filtering typically with
    far fewer terms needing to be visited.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Joel Halbert at May 21, 2009 at 3:49 pm
    TrieRangeQuery - thanks for the tip.


    -----Original Message-----
    From: Michael McCandless <lucene@mikemccandless.com>
    Reply-To: java-user@lucene.apache.org
    To: java-user@lucene.apache.org
    Subject: Re: Does Lucene fail fast on boolean queries?
    Date: Thu, 21 May 2009 11:39:23 -0400
    On Thu, May 21, 2009 at 10:58 AM, Joel Halbert wrote:
    Thx. We're not relying on the internal implementation, but I was
    wondering with respect to how efficient it is with respect to doing a
    boolean AND query.

    i.e. does clause precedence effect the efficiency of the query - so is X
    && Y faster than Y && X if there are fewer hits for X. From how you
    describe it it is equally efficient, either way.
    Lucene has a simple heuristic (in ConjunctionScorer): it first calls
    next() on each of X and Y, then it looks at that first docID for each
    and re-orders the clauses so that whichever of X or Y had the largest
    first docID is the one that "drives" the intersection. But that
    heuristic could easily be wrong if it just so happened that a rare
    term appears in an early document. A simple improvement would be to
    add an approxCount() to Query or Scorer/DocIdSet and then order based
    on that. But we haven't done so yet...
    In particular we are trying to work out whether a particular numerical
    RangeQuery that needs to be AND'd with a TermQuery is fastest as:

    BooleanQuery(RangeQuery && TermQuery)

    or as a TermQuery which then has it's results filtered by processing
    each in turn.
    If the RangeQuery will hit lots of terms and/or docs, you're likely
    better off building the filter; else, the query. Test both and let
    the machine tell you :)

    But: orthogonal to that decision, you should look @ TrieRangeQuery (in
    2.9-dev, contrib/queries) which does numeric filtering typically with
    far fewer terms needing to be visited.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 21, '09 at 12:44p
activeMay 21, '09 at 3:49p
posts7
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase