FAQ
Query: caught a class org.apache.lucene.queryParser.ParseException
with message: Too many boolean clauses

I realize why this is happening (the 1024 clauses limit for BooleanQuery).
My question is more design related.

During customer registration, the customer defines a set of skus/products
that we should never display to them. These products are part of our catalog
offering but we are forbidden to make them available to this customer. This
list is called the block list and can potentially be large (6 to 7
thousand).

When a customer logs in, this block list is identified and currently I am
using QueryParser to parse these skus to block/exclude. That is why I am
hitting against the 1024 upper bound.

To circumvent it, here are a few options that I have thought of:
1. Chunk it up:
a. Create a filter based on a query that has a maximum of 1024.
b. Get its bits.
c. Get the next 1024 blocked skus and create a filter out of it and get
its bits.
d. AND the two BitSets.
e. Do this till all blocked skus and other filters are ANDed together for
the final BitSet.

2. Build the block list into the index somehow
a. My index is based on SKUs, not on customer.
b. I could add a field in each SKU document that contains the customer-ids

who want this SKU blocked.
c. But this field's value could be very large.

3. Some other obvious way that I am stupid enough not to be able to
visualize.

Thanks in advance
Sid





---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Aigner, Thomas at Oct 17, 2005 at 7:42 pm
    Another way around it is to increase the max clause count.

    //Setting the clause Count
    BooleanQuery.setMaxClauseCount(int);

    Can use maxint or some number smaller.. When I set this high, I have had
    to set the java pool higher for memory as well.

    Tom

    -----Original Message-----
    From: Sharma, Siddharth
    Sent: Monday, October 17, 2005 3:32 PM
    To: [email protected]
    Subject: Too many clauses

    Query: caught a class org.apache.lucene.queryParser.ParseException
    with message: Too many boolean clauses

    I realize why this is happening (the 1024 clauses limit for
    BooleanQuery).
    My question is more design related.

    During customer registration, the customer defines a set of
    skus/products
    that we should never display to them. These products are part of our
    catalog
    offering but we are forbidden to make them available to this customer.
    This
    list is called the block list and can potentially be large (6 to 7
    thousand).

    When a customer logs in, this block list is identified and currently I
    am
    using QueryParser to parse these skus to block/exclude. That is why I am
    hitting against the 1024 upper bound.

    To circumvent it, here are a few options that I have thought of:
    1. Chunk it up:
    a. Create a filter based on a query that has a maximum of 1024.
    b. Get its bits.
    c. Get the next 1024 blocked skus and create a filter out of it and
    get
    its bits.
    d. AND the two BitSets.
    e. Do this till all blocked skus and other filters are ANDed together
    for
    the final BitSet.

    2. Build the block list into the index somehow
    a. My index is based on SKUs, not on customer.
    b. I could add a field in each SKU document that contains the
    customer-ids

    who want this SKU blocked.
    c. But this field's value could be very large.

    3. Some other obvious way that I am stupid enough not to be able to
    visualize.

    Thanks in advance
    Sid





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Jibu mathew at Oct 18, 2005 at 4:42 am
    Hi all,

    I need urgent help for the following issues.



    What is the query string to retrieve all the documents indexed
    (something similar to *.*)?
    In a program I have indexed 10 files. When I do a search using the query
    "contents:java", it will return 2 documents. But when I give
    "-contents:java", then it will return an empty result set. Does anyone
    know what the right query string for this? I.e., to retrieve all
    documents that does not contain the word 'java'.
    What is the query string to retrieve all the documents which content is
    empty?


    Please help me as soon as possible



    Thanks

    Jibu


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Sharma, Siddharth at Oct 17, 2005 at 8:37 pm
    I thought of that but I had that listed as a last fallback option because I
    was not sure what it meant in terms of performance since I am a newbie to
    Lucene.
    So if I bump up my heap (I assume that's what you are referring to when you
    say java pool) it'll be ok?
    Are there metrics around this?
    At x max_clauses, jvm heap should be y meg
    At x + 1024, it should be z meg




    -----Original Message-----
    From: Aigner, Thomas
    Sent: Monday, October 17, 2005 3:42 PM
    To: [email protected]
    Subject: RE: Too many clauses

    Another way around it is to increase the max clause count.

    //Setting the clause Count
    BooleanQuery.setMaxClauseCount(int);

    Can use maxint or some number smaller.. When I set this high, I have had
    to set the java pool higher for memory as well.

    Tom

    -----Original Message-----
    From: Sharma, Siddharth
    Sent: Monday, October 17, 2005 3:32 PM
    To: [email protected]
    Subject: Too many clauses

    Query: caught a class org.apache.lucene.queryParser.ParseException
    with message: Too many boolean clauses

    I realize why this is happening (the 1024 clauses limit for
    BooleanQuery).
    My question is more design related.

    During customer registration, the customer defines a set of
    skus/products
    that we should never display to them. These products are part of our
    catalog
    offering but we are forbidden to make them available to this customer.
    This
    list is called the block list and can potentially be large (6 to 7
    thousand).

    When a customer logs in, this block list is identified and currently I
    am
    using QueryParser to parse these skus to block/exclude. That is why I am
    hitting against the 1024 upper bound.

    To circumvent it, here are a few options that I have thought of:
    1. Chunk it up:
    a. Create a filter based on a query that has a maximum of 1024.
    b. Get its bits.
    c. Get the next 1024 blocked skus and create a filter out of it and
    get
    its bits.
    d. AND the two BitSets.
    e. Do this till all blocked skus and other filters are ANDed together
    for
    the final BitSet.

    2. Build the block list into the index somehow
    a. My index is based on SKUs, not on customer.
    b. I could add a field in each SKU document that contains the
    customer-ids

    who want this SKU blocked.
    c. But this field's value could be very large.

    3. Some other obvious way that I am stupid enough not to be able to
    visualize.

    Thanks in advance
    Sid





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Chris Hostetter at Oct 17, 2005 at 9:05 pm
    :
    : To circumvent it, here are a few options that I have thought of:
    : 1. Chunk it up:
    : a. Create a filter based on a query that has a maximum of 1024.
    : b. Get its bits.
    : c. Get the next 1024 blocked skus and create a filter out of it and get
    : its bits.
    : d. AND the two BitSets.
    : e. Do this till all blocked skus and other filters are ANDed together for
    : the final BitSet.

    Instead of building up your filter based on a query, why not build up your
    filter directly? ... Using a QueryFilter requires that scoring happen --
    but you don't care about the scoring, you just want to know if a doc
    matches a keyword or not. Take a look at the way RangeFilter is
    implimented. it should be able to searve as a good example of how you can
    write a "SetFilter" that takes in a field name and a set of keywords, and
    only "passes" documents where one of the keywords shows up as an indexed
    value for that field. Now you don't have toworry baout the 1024 limit,
    you don't have to "chunk" anything, your searches will be faster because
    you don't need to worry about the scoring aspects of a the BooleanQueries.


    Hint: you can sort the input Set, and then iterate over it, pulling out
    the TermDocs for each, and scoring each doc in each TermDocs. now your
    Filter indicates all the products that do match those skus, and you'll
    want an "InverseFilter to wrap it and indicate all the products that
    *don't* match those skus.



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Sharma, Siddharth at Oct 19, 2005 at 6:06 pm
    Thanks Chris

    I haven't tried it yet, but I think I understand your idea now (after 24
    hours, man I'm slow on the uptake;)
    I'll try it today.
    -Sid


    -----Original Message-----
    From: Chris Hostetter
    Sent: Monday, October 17, 2005 5:05 PM
    To: [email protected]
    Subject: Re: Too many clauses

    :
    : To circumvent it, here are a few options that I have thought of:
    : 1. Chunk it up:
    : a. Create a filter based on a query that has a maximum of 1024.
    : b. Get its bits.
    : c. Get the next 1024 blocked skus and create a filter out of it and get
    : its bits.
    : d. AND the two BitSets.
    : e. Do this till all blocked skus and other filters are ANDed together
    for
    : the final BitSet.

    Instead of building up your filter based on a query, why not build up your
    filter directly? ... Using a QueryFilter requires that scoring happen --
    but you don't care about the scoring, you just want to know if a doc
    matches a keyword or not. Take a look at the way RangeFilter is
    implimented. it should be able to searve as a good example of how you can
    write a "SetFilter" that takes in a field name and a set of keywords, and
    only "passes" documents where one of the keywords shows up as an indexed
    value for that field. Now you don't have toworry baout the 1024 limit,
    you don't have to "chunk" anything, your searches will be faster because
    you don't need to worry about the scoring aspects of a the BooleanQueries.


    Hint: you can sort the input Set, and then iterate over it, pulling out
    the TermDocs for each, and scoring each doc in each TermDocs. now your
    Filter indicates all the products that do match those skus, and you'll
    want an "InverseFilter to wrap it and indicate all the products that
    *don't* match those skus.



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 17, '05 at 7:32p
activeOct 19, '05 at 6:06p
posts6
users4
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase