FAQ
Hi,

When Lucene performs a Boolean query, say:

Field Name = Male
AND
Field Age = 30

assuming the resultant docs for each portion of the query were:

Matching docs for: Name = 1,2
Matching docs for: Age = 1,2,5,10

Will Lucene stop searching for documents matching the Age term once it
has found documents 1 and 2 ?
i.e. since 5 and 10 will not be used will it stop searching at document
number 2 ?

Thx,
Joel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

## Search Discussions

•  at May 21, 2009 at 2:30 pm ⇧
Yes.

As soon as Lucene sees that the Name docID iteration has ended, the
search will break.

Mike
On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
Hi,

When Lucene performs a Boolean query, say:

Field Name = Male
AND
Field Age = 30

assuming the resultant docs for each portion of the query were:

Matching docs for:  Name = 1,2
Matching docs for:  Age = 1,2,5,10

Will Lucene stop searching for documents matching the Age term once it
has found documents 1 and 2 ?
i.e. since 5 and 10 will not be used will it stop searching at document
number 2 ?

Thx,
Joel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
•  at May 21, 2009 at 2:35 pm ⇧
Thx. so, just to clarify, in the example I gave below...

Lucene will search for documents matching on Name and find doc 1 and doc
2.
Then it will search age and find docs 1, 2 and then break. It will not
go on to seek 5 and 10...?

-----Original Message-----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Subject: Re: Does Lucene fail fast on boolean queries?
Date: Thu, 21 May 2009 10:29:57 -0400

Yes.

As soon as Lucene sees that the Name docID iteration has ended, the
search will break.

Mike
On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
Hi,

When Lucene performs a Boolean query, say:

Field Name = Male
AND
Field Age = 30

assuming the resultant docs for each portion of the query were:

Matching docs for: Name = 1,2
Matching docs for: Age = 1,2,5,10

Will Lucene stop searching for documents matching the Age term once it
has found documents 1 and 2 ?
i.e. since 5 and 10 will not be used will it stop searching at document
number 2 ?

Thx,
Joel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
•  at May 21, 2009 at 2:44 pm ⇧
Well... scoring of AND queries currently is done doc-at-once.

doc >= 1, will see that both have doc=1 and collect it. The same
thing happens for doc=2. Then, Lucene will ask for the next doc of
Name, which returns "false" (end of docs) and the loop breaks.

OR, Lucene may drive the query in the opposite order (age and then
Name), in which case it's the same through doc=2, but then Lucene asks
doc >= 5, which returns false, and the loop breaks.

So in fact "doc=5" can be asked for by Lucene.

Also note that this is an internal implementation detail -- Lucene
could easily change to do batch processing of AND'd queries in which
case docs 5,10 could easily be iterated on. So I wouldn't "rely" on

Mike
On Thu, May 21, 2009 at 10:34 AM, Joel Halbert wrote:
Thx. so, just to clarify, in the example I gave below...

Lucene will search for documents matching on Name and find doc 1 and doc
2.
Then it will search age and find docs 1, 2 and then break. It will not
go on to seek 5 and 10...?

-----Original Message-----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Subject: Re: Does Lucene fail fast on boolean queries?
Date: Thu, 21 May 2009 10:29:57 -0400

Yes.

As soon as Lucene sees that the Name docID iteration has ended, the
search will break.

Mike
On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
Hi,

When Lucene performs a Boolean query, say:

Field Name = Male
AND
Field Age = 30

assuming the resultant docs for each portion of the query were:

Matching docs for:  Name = 1,2
Matching docs for:  Age = 1,2,5,10

Will Lucene stop searching for documents matching the Age term once it
has found documents 1 and 2 ?
i.e. since 5 and 10 will not be used will it stop searching at document
number 2 ?

Thx,
Joel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
•  at May 21, 2009 at 2:58 pm ⇧
Thx. We're not relying on the internal implementation, but I was
wondering with respect to how efficient it is with respect to doing a
boolean AND query.

i.e. does clause precedence effect the efficiency of the query - so is X
&& Y faster than Y && X if there are fewer hits for X. From how you
describe it it is equally efficient, either way.

In particular we are trying to work out whether a particular numerical
RangeQuery that needs to be AND'd with a TermQuery is fastest as:

BooleanQuery(RangeQuery && TermQuery)

or as a TermQuery which then has it's results filtered by processing
each in turn.

-----Original Message-----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Subject: Re: Does Lucene fail fast on boolean queries?
Date: Thu, 21 May 2009 10:43:46 -0400

Well... scoring of AND queries currently is done doc-at-once.

doc >= 1, will see that both have doc=1 and collect it. The same
thing happens for doc=2. Then, Lucene will ask for the next doc of
Name, which returns "false" (end of docs) and the loop breaks.

OR, Lucene may drive the query in the opposite order (age and then
Name), in which case it's the same through doc=2, but then Lucene asks
doc >= 5, which returns false, and the loop breaks.

So in fact "doc=5" can be asked for by Lucene.

Also note that this is an internal implementation detail -- Lucene
could easily change to do batch processing of AND'd queries in which
case docs 5,10 could easily be iterated on. So I wouldn't "rely" on

Mike
On Thu, May 21, 2009 at 10:34 AM, Joel Halbert wrote:
Thx. so, just to clarify, in the example I gave below...

Lucene will search for documents matching on Name and find doc 1 and doc
2.
Then it will search age and find docs 1, 2 and then break. It will not
go on to seek 5 and 10...?

-----Original Message-----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Subject: Re: Does Lucene fail fast on boolean queries?
Date: Thu, 21 May 2009 10:29:57 -0400

Yes.

As soon as Lucene sees that the Name docID iteration has ended, the
search will break.

Mike
On Thu, May 21, 2009 at 8:44 AM, Joel Halbert wrote:
Hi,

When Lucene performs a Boolean query, say:

Field Name = Male
AND
Field Age = 30

assuming the resultant docs for each portion of the query were:

Matching docs for: Name = 1,2
Matching docs for: Age = 1,2,5,10

Will Lucene stop searching for documents matching the Age term once it
has found documents 1 and 2 ?
i.e. since 5 and 10 will not be used will it stop searching at document
number 2 ?

Thx,
Joel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
•  at May 21, 2009 at 3:39 pm ⇧

On Thu, May 21, 2009 at 10:58 AM, Joel Halbert wrote:
Thx.  We're not relying on the internal implementation, but I was
wondering with respect to how efficient it is with respect to doing a
boolean AND query.

i.e. does clause precedence effect the efficiency of the query - so is X
&& Y faster than Y && X if there are fewer hits for X. From how you
describe it it is equally efficient, either way.
Lucene has a simple heuristic (in ConjunctionScorer): it first calls
next() on each of X and Y, then it looks at that first docID for each
and re-orders the clauses so that whichever of X or Y had the largest
first docID is the one that "drives" the intersection. But that
heuristic could easily be wrong if it just so happened that a rare
term appears in an early document. A simple improvement would be to
add an approxCount() to Query or Scorer/DocIdSet and then order based
on that. But we haven't done so yet...
In particular we are trying to work out whether a particular numerical
RangeQuery that needs to be AND'd with a TermQuery is fastest as:

BooleanQuery(RangeQuery && TermQuery)

or as a TermQuery which then has it's results filtered by processing
each in turn.
If the RangeQuery will hit lots of terms and/or docs, you're likely
better off building the filter; else, the query. Test both and let
the machine tell you :)

But: orthogonal to that decision, you should look @ TrieRangeQuery (in
2.9-dev, contrib/queries) which does numeric filtering typically with
far fewer terms needing to be visited.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
•  at May 21, 2009 at 3:49 pm ⇧
TrieRangeQuery - thanks for the tip.

-----Original Message-----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Subject: Re: Does Lucene fail fast on boolean queries?
Date: Thu, 21 May 2009 11:39:23 -0400
On Thu, May 21, 2009 at 10:58 AM, Joel Halbert wrote:
Thx. We're not relying on the internal implementation, but I was
wondering with respect to how efficient it is with respect to doing a
boolean AND query.

i.e. does clause precedence effect the efficiency of the query - so is X
&& Y faster than Y && X if there are fewer hits for X. From how you
describe it it is equally efficient, either way.
Lucene has a simple heuristic (in ConjunctionScorer): it first calls
next() on each of X and Y, then it looks at that first docID for each
and re-orders the clauses so that whichever of X or Y had the largest
first docID is the one that "drives" the intersection. But that
heuristic could easily be wrong if it just so happened that a rare
term appears in an early document. A simple improvement would be to
add an approxCount() to Query or Scorer/DocIdSet and then order based
on that. But we haven't done so yet...
In particular we are trying to work out whether a particular numerical
RangeQuery that needs to be AND'd with a TermQuery is fastest as:

BooleanQuery(RangeQuery && TermQuery)

or as a TermQuery which then has it's results filtered by processing
each in turn.
If the RangeQuery will hit lots of terms and/or docs, you're likely
better off building the filter; else, the query. Test both and let
the machine tell you :)

But: orthogonal to that decision, you should look @ TrieRangeQuery (in
2.9-dev, contrib/queries) which does numeric filtering typically with
far fewer terms needing to be visited.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

## Related Discussions

Discussion Overview
 group java-user categories lucene posted May 21, '09 at 12:44p active May 21, '09 at 3:49p posts 7 users 2 website lucene.apache.org

### 2 users in discussion

Content

People

Support

Translate

site design / logo © 2022 Grokbase