Grokbase Groups Lucene dev April 2004
FAQ
Hello,

I have been reviewing some of the code related to boolean queries and I
wanted to see if my understanding is approximately correct regarding how
they are handled and, more importantly, the limitations.


Here is what I have come to understand so far:

1) The QueryParser code generated from javacc will parse my boolean query
and determine for each clause whether or not is 'required' (based on a few
conditions, but, in short, whether or not it was introduced or followed by
'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').

2) As my BooleanQuery is being constructed, it will throw a
BooleanQuery.TooManyClauses exception if I exceed
BooleanQuery.maxClauseCount (which defaults to 1024).

3) The maxClauseCount threshold appears not to care whether or not my
clauses are 'required' or 'prohibited'... only how many of them there are in
total.

4) My BooleanQuery will prepare its own Scorer instance (i.e.
BooleanScorer). And, during this step, it will identify to the scorer which
clauses are 'required' or 'prohibited'. And, if more than 32 fall into this
category, a IndexOutOfBoundsException ("More than 32 required/prohibited
clauses in query.") is thrown.
That's as far as I got.
Now, I am a bit confused at this point. Does this mean I can make a boolean
query consisting of up to 1024 clauses as long as no more than 32 of them
are required or prohibited? This doesn't seem right. So, am I missing
something in the way I am understanding this.
I am (as you may have guessed) generating large boolean queries. And, in
some rare cases, I am receiving the exception identified in #4 (above). So,
I am trying to figure out whether or not I need to change/filter my queries
in a special way in order to avoid this exception. And, in order to do
this, I want to understand how these queries are being handled.
Finally, is there something related to the query syntax that could be my
mistake? For example, what is the difference between:
"A B" AND "C D" AND "D E"
... and...
("A B") AND ("C D") AND ("D E")
... could that be the crux of it?

Thank you for your time,
Tate Avery


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Search Discussions

  • Stephane James Vaucher at Apr 29, 2004 at 5:10 pm

    On Thu, 29 Apr 2004, Tate Avery wrote:

    Hello,

    I have been reviewing some of the code related to boolean queries and I
    wanted to see if my understanding is approximately correct regarding how
    they are handled and, more importantly, the limitations.
    You can always submit requests for enhancements in bugzilla, so as to keep
    track this issue.
    Here is what I have come to understand so far:

    1) The QueryParser code generated from javacc will parse my boolean query
    and determine for each clause whether or not is 'required' (based on a few
    conditions, but, in short, whether or not it was introduced or followed by
    'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').
    Your usage seems pretty particular, why are you using the javacc
    QueryParser?
    2) As my BooleanQuery is being constructed, it will throw a
    BooleanQuery.TooManyClauses exception if I exceed
    BooleanQuery.maxClauseCount (which defaults to 1024).
    It's configurable through sys properties or by
    BooleanQuery.setMaxClauseCount(int maxClauseCount)
    3) The maxClauseCount threshold appears not to care whether or not my
    clauses are 'required' or 'prohibited'... only how many of them there are in
    total.

    4) My BooleanQuery will prepare its own Scorer instance (i.e.
    BooleanScorer). And, during this step, it will identify to the scorer which
    clauses are 'required' or 'prohibited'. And, if more than 32 fall into this
    category, a IndexOutOfBoundsException ("More than 32 required/prohibited
    clauses in query.") is thrown.
    That's as far as I got.
    Now, I am a bit confused at this point. Does this mean I can make a boolean
    query consisting of up to 1024 clauses as long as no more than 32 of them
    are required or prohibited? This doesn't seem right. So, am I missing
    something in the way I am understanding this.
    I am (as you may have guessed) generating large boolean queries. And, in
    some rare cases, I am receiving the exception identified in #4 (above). So,
    I am trying to figure out whether or not I need to change/filter my queries
    in a special way in order to avoid this exception. And, in order to do
    this, I want to understand how these queries are being handled.
    Finally, is there something related to the query syntax that could be my
    mistake? For example, what is the difference between:
    "A B" AND "C D" AND "D E"
    ... and...
    ("A B") AND ("C D") AND ("D E")
    ... could that be the crux of it?
    I can't help you here, and the doc seems rather thin (or nonexistent for
    this class). I don't know the relation between the query and how the
    scorer will process it.

    Sorry I can't be of assistance,
    sv
    Thank you for your time,
    Tate Avery


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
  • Tate Avery at Apr 29, 2004 at 5:36 pm
    Thank you for the response.

    I am not using the QueryParser directly... it was just part of my overall
    understanding of how this exception is coming about. Same thing,
    essentially, with the maxClauseCount.


    Here is some code to illustrate what is confusing me and what I am trying to
    ascertain:

    int _numClauses = XXX;
    boolean _required = XXX; // 3 examples of these var settings below

    BooleanQuery _query = new BooleanQuery();

    for (int _i = 0; _i < _numClauses; _i++)
    {
    _query.add(
    new BooleanClause(
    new TermQuery(new Term("body", "term" + _i)),
    _required,
    false));
    }

    Hits _hits = new IndexSearcher(INDEX_DIR).search(_query);


    1) With _numClauses=9999 and _required=false (for example), I have no
    problems.
    (This is confusing since 9999 is more than maxClauseCount... but I won't
    complain).

    2) With _numClauses=32 and _required=true, I also have no problems.

    3) With _numClauses=33 and _required=true, I get
    "java.lang.IndexOutOfBoundsException: More than 32 required/prohibited
    clauses in query." as a runtime exception.


    So, I guess I am trying to ask the following:

    Is a query like (T1 AND T2 AND ... AND T32 AND T33) just completely illegal
    for Lucene?
    OR is there some way to extend this limit?
    OR am I missing something that is clouding my understanding?



    Thanks,
    Tate



    -----Original Message-----
    From: Stephane James Vaucher
    Sent: Thursday, April 29, 2004 1:10 PM
    To: Lucene Users List; tate.avery@nstein.com
    Cc: lucene-dev@jakarta.apache.org
    Subject: Re: Understanding Boolean Queries

    On Thu, 29 Apr 2004, Tate Avery wrote:

    Hello,

    I have been reviewing some of the code related to boolean queries and I
    wanted to see if my understanding is approximately correct regarding how
    they are handled and, more importantly, the limitations.
    You can always submit requests for enhancements in bugzilla, so as to keep
    track this issue.
    Here is what I have come to understand so far:

    1) The QueryParser code generated from javacc will parse my boolean query
    and determine for each clause whether or not is 'required' (based on a few
    conditions, but, in short, whether or not it was introduced or followed by
    'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').
    Your usage seems pretty particular, why are you using the javacc
    QueryParser?
    2) As my BooleanQuery is being constructed, it will throw a
    BooleanQuery.TooManyClauses exception if I exceed
    BooleanQuery.maxClauseCount (which defaults to 1024).
    It's configurable through sys properties or by
    BooleanQuery.setMaxClauseCount(int maxClauseCount)
    3) The maxClauseCount threshold appears not to care whether or not my
    clauses are 'required' or 'prohibited'... only how many of them there are in
    total.

    4) My BooleanQuery will prepare its own Scorer instance (i.e.
    BooleanScorer). And, during this step, it will identify to the scorer which
    clauses are 'required' or 'prohibited'. And, if more than 32 fall into this
    category, a IndexOutOfBoundsException ("More than 32 required/prohibited
    clauses in query.") is thrown.
    That's as far as I got.
    Now, I am a bit confused at this point. Does this mean I can make a boolean
    query consisting of up to 1024 clauses as long as no more than 32 of them
    are required or prohibited? This doesn't seem right. So, am I missing
    something in the way I am understanding this.
    I am (as you may have guessed) generating large boolean queries. And, in
    some rare cases, I am receiving the exception identified in #4 (above). So,
    I am trying to figure out whether or not I need to change/filter my queries
    in a special way in order to avoid this exception. And, in order to do
    this, I want to understand how these queries are being handled.
    Finally, is there something related to the query syntax that could be my
    mistake? For example, what is the difference between:
    "A B" AND "C D" AND "D E"
    ... and...
    ("A B") AND ("C D") AND ("D E")
    ... could that be the crux of it?
    I can't help you here, and the doc seems rather thin (or nonexistent for
    this class). I don't know the relation between the query and how the
    scorer will process it.

    Sorry I can't be of assistance,
    sv
    Thank you for your time,
    Tate Avery


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
  • Stephane James Vaucher at Apr 29, 2004 at 5:41 pm
    Hi Tate,

    Forgot to ask, what version of Lucene? (IIRC, <= 1.2, means no
    maxClauseCount)

    sv
    On Thu, 29 Apr 2004, Tate Avery wrote:

    Thank you for the response.

    I am not using the QueryParser directly... it was just part of my overall
    understanding of how this exception is coming about. Same thing,
    essentially, with the maxClauseCount.


    Here is some code to illustrate what is confusing me and what I am trying to
    ascertain:

    int _numClauses = XXX;
    boolean _required = XXX; // 3 examples of these var settings below

    BooleanQuery _query = new BooleanQuery();

    for (int _i = 0; _i < _numClauses; _i++)
    {
    _query.add(
    new BooleanClause(
    new TermQuery(new Term("body", "term" + _i)),
    _required,
    false));
    }

    Hits _hits = new IndexSearcher(INDEX_DIR).search(_query);


    1) With _numClauses=9999 and _required=false (for example), I have no
    problems.
    (This is confusing since 9999 is more than maxClauseCount... but I won't
    complain).

    2) With _numClauses=32 and _required=true, I also have no problems.

    3) With _numClauses=33 and _required=true, I get
    "java.lang.IndexOutOfBoundsException: More than 32 required/prohibited
    clauses in query." as a runtime exception.


    So, I guess I am trying to ask the following:

    Is a query like (T1 AND T2 AND ... AND T32 AND T33) just completely illegal
    for Lucene?
    OR is there some way to extend this limit?
    OR am I missing something that is clouding my understanding?



    Thanks,
    Tate



    -----Original Message-----
    From: Stephane James Vaucher
    Sent: Thursday, April 29, 2004 1:10 PM
    To: Lucene Users List; tate.avery@nstein.com
    Cc: lucene-dev@jakarta.apache.org
    Subject: Re: Understanding Boolean Queries

    On Thu, 29 Apr 2004, Tate Avery wrote:

    Hello,

    I have been reviewing some of the code related to boolean queries and I
    wanted to see if my understanding is approximately correct regarding how
    they are handled and, more importantly, the limitations.
    You can always submit requests for enhancements in bugzilla, so as to keep
    track this issue.
    Here is what I have come to understand so far:

    1) The QueryParser code generated from javacc will parse my boolean query
    and determine for each clause whether or not is 'required' (based on a few
    conditions, but, in short, whether or not it was introduced or followed by
    'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').
    Your usage seems pretty particular, why are you using the javacc
    QueryParser?
    2) As my BooleanQuery is being constructed, it will throw a
    BooleanQuery.TooManyClauses exception if I exceed
    BooleanQuery.maxClauseCount (which defaults to 1024).
    It's configurable through sys properties or by
    BooleanQuery.setMaxClauseCount(int maxClauseCount)
    3) The maxClauseCount threshold appears not to care whether or not my
    clauses are 'required' or 'prohibited'... only how many of them there are in
    total.

    4) My BooleanQuery will prepare its own Scorer instance (i.e.
    BooleanScorer). And, during this step, it will identify to the scorer which
    clauses are 'required' or 'prohibited'. And, if more than 32 fall into this
    category, a IndexOutOfBoundsException ("More than 32 required/prohibited
    clauses in query.") is thrown.
    That's as far as I got.
    Now, I am a bit confused at this point. Does this mean I can make a boolean
    query consisting of up to 1024 clauses as long as no more than 32 of them
    are required or prohibited? This doesn't seem right. So, am I missing
    something in the way I am understanding this.
    I am (as you may have guessed) generating large boolean queries. And, in
    some rare cases, I am receiving the exception identified in #4 (above). So,
    I am trying to figure out whether or not I need to change/filter my queries
    in a special way in order to avoid this exception. And, in order to do
    this, I want to understand how these queries are being handled.
    Finally, is there something related to the query syntax that could be my
    mistake? For example, what is the difference between:
    "A B" AND "C D" AND "D E"
    ... and...
    ("A B") AND ("C D") AND ("D E")
    ... could that be the crux of it?
    I can't help you here, and the doc seems rather thin (or nonexistent for
    this class). I don't know the relation between the query and how the
    scorer will process it.

    Sorry I can't be of assistance,
    sv
    Thank you for your time,
    Tate Avery


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
  • Tate Avery at Apr 29, 2004 at 5:48 pm
    Sorry, I retract this statement...
    1) With _numClauses=9999 and _required=false (for example), I have no
    problems.
    (This is confusing since 9999 is more than maxClauseCount... but I won't
    complain).

    My little test app was using lucene-1.3-rc1.jar. But, my REAL app is using
    lucene-1.3-final.jar.

    So, with _numClauses=1024 and _required=false, I have no problems.
    And, with _numClauses=1025 and _required=false, I get the TooManyClauses
    exception.

    All of that is fine and good. My main concern is the 32/33 threshold when
    _required=true (see details below).


    Tate



    -----Original Message-----
    From: Tate Avery
    Sent: Thursday, April 29, 2004 1:30 PM
    To: 'Lucene Users List'
    Cc: lucene-dev@jakarta.apache.org
    Subject: RE: Understanding Boolean Queries


    Thank you for the response.

    I am not using the QueryParser directly... it was just part of my overall
    understanding of how this exception is coming about. Same thing,
    essentially, with the maxClauseCount.


    Here is some code to illustrate what is confusing me and what I am trying to
    ascertain:

    int _numClauses = XXX;
    boolean _required = XXX; // 3 examples of these var settings below

    BooleanQuery _query = new BooleanQuery();

    for (int _i = 0; _i < _numClauses; _i++)
    {
    _query.add(
    new BooleanClause(
    new TermQuery(new Term("body", "term" + _i)),
    _required,
    false));
    }

    Hits _hits = new IndexSearcher(INDEX_DIR).search(_query);


    1) With _numClauses=9999 and _required=false (for example), I have no
    problems.
    (This is confusing since 9999 is more than maxClauseCount... but I won't
    complain).

    2) With _numClauses=32 and _required=true, I also have no problems.

    3) With _numClauses=33 and _required=true, I get
    "java.lang.IndexOutOfBoundsException: More than 32 required/prohibited
    clauses in query." as a runtime exception.


    So, I guess I am trying to ask the following:

    Is a query like (T1 AND T2 AND ... AND T32 AND T33) just completely illegal
    for Lucene?
    OR is there some way to extend this limit?
    OR am I missing something that is clouding my understanding?



    Thanks,
    Tate



    -----Original Message-----
    From: Stephane James Vaucher
    Sent: Thursday, April 29, 2004 1:10 PM
    To: Lucene Users List; tate.avery@nstein.com
    Cc: lucene-dev@jakarta.apache.org
    Subject: Re: Understanding Boolean Queries

    On Thu, 29 Apr 2004, Tate Avery wrote:

    Hello,

    I have been reviewing some of the code related to boolean queries and I
    wanted to see if my understanding is approximately correct regarding how
    they are handled and, more importantly, the limitations.
    You can always submit requests for enhancements in bugzilla, so as to keep
    track this issue.
    Here is what I have come to understand so far:

    1) The QueryParser code generated from javacc will parse my boolean query
    and determine for each clause whether or not is 'required' (based on a few
    conditions, but, in short, whether or not it was introduced or followed by
    'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').
    Your usage seems pretty particular, why are you using the javacc
    QueryParser?
    2) As my BooleanQuery is being constructed, it will throw a
    BooleanQuery.TooManyClauses exception if I exceed
    BooleanQuery.maxClauseCount (which defaults to 1024).
    It's configurable through sys properties or by
    BooleanQuery.setMaxClauseCount(int maxClauseCount)
    3) The maxClauseCount threshold appears not to care whether or not my
    clauses are 'required' or 'prohibited'... only how many of them there are in
    total.

    4) My BooleanQuery will prepare its own Scorer instance (i.e.
    BooleanScorer). And, during this step, it will identify to the scorer which
    clauses are 'required' or 'prohibited'. And, if more than 32 fall into this
    category, a IndexOutOfBoundsException ("More than 32 required/prohibited
    clauses in query.") is thrown.
    That's as far as I got.
    Now, I am a bit confused at this point. Does this mean I can make a boolean
    query consisting of up to 1024 clauses as long as no more than 32 of them
    are required or prohibited? This doesn't seem right. So, am I missing
    something in the way I am understanding this.
    I am (as you may have guessed) generating large boolean queries. And, in
    some rare cases, I am receiving the exception identified in #4 (above). So,
    I am trying to figure out whether or not I need to change/filter my queries
    in a special way in order to avoid this exception. And, in order to do
    this, I want to understand how these queries are being handled.
    Finally, is there something related to the query syntax that could be my
    mistake? For example, what is the difference between:
    "A B" AND "C D" AND "D E"
    ... and...
    ("A B") AND ("C D") AND ("D E")
    ... could that be the crux of it?
    I can't help you here, and the doc seems rather thin (or nonexistent for
    this class). I don't know the relation between the query and how the
    scorer will process it.

    Sorry I can't be of assistance,
    sv
    Thank you for your time,
    Tate Avery


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedApr 29, '04 at 4:19p
activeApr 29, '04 at 5:48p
posts5
users2
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase