FAQ
I'm constructing a search with some required terms and some optional
terms in in the query. According to some earlier posts that looks like
"+(A B) C D E" in query syntax for required terms A and B and optional
terms C D and E. In other words, Lucene considers all documents that
have both A and B, and ranks them higher if they also have C D or E.

I'm wondering how this translates to a BooleanQuery. I know I should use
BooleanClause.Occur.MUST for A and B, and I guess I should use
BooleanQuery.Occur.SHOULD for C, D and E. However the javadocs for
BooleanClause.Occur.SHOULD states:

"Use this operator for clauses that /should/ appear in the matching
documents. For a BooleanQuery with two |SHOULD| subqueries, at least one
of the clauses must appear in the matching documents."

Does this last sentence actually mean that a query with _just_ two
SHOULD clauses (ie. only SHOULD clauses) must contain one of the
clauses, or will the BooleanQuery described above actually constrain the
search results to (A AND B) AND (B OR C OR D)? If so, what should I use
instead?

thank you,
Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Mark Miller at May 21, 2007 at 12:49 am
    I like to think of it like this:

    Each doc is going to get a score -- if the score is positive the doc
    will be a hit, if the score is 0 the doc will not be a hit.

    If a boolean clause is Occur.Must and it is not found, the score will be
    dropped to 0 no matter what (if found, the score is obviously
    increased). If a boolean clause is Occur.Must_Not and is found then the
    score will be dropped to 0 no matter what.
    If the boolean query is Occur.Should and it is found a positive number
    is added to the score...if it is not found, nothing is added to the score.

    Now you see why it says: "Use this operator for clauses that /should/
    appear in the matching documents. For a BooleanQuery with two |SHOULD|
    subqueries, at least one of the clauses must appear in the matching
    documents."

    To get a positive score and make a hit, one of the Occur.Should clauses
    needs to be found to increase the score above 0.

    - Mark

    Peter Bloem wrote:
    I'm constructing a search with some required terms and some optional
    terms in in the query. According to some earlier posts that looks like
    "+(A B) C D E" in query syntax for required terms A and B and optional
    terms C D and E. In other words, Lucene considers all documents that
    have both A and B, and ranks them higher if they also have C D or E.

    I'm wondering how this translates to a BooleanQuery. I know I should
    use BooleanClause.Occur.MUST for A and B, and I guess I should use
    BooleanQuery.Occur.SHOULD for C, D and E. However the javadocs for
    BooleanClause.Occur.SHOULD states:

    "Use this operator for clauses that /should/ appear in the matching
    documents. For a BooleanQuery with two |SHOULD| subqueries, at least
    one of the clauses must appear in the matching documents."

    Does this last sentence actually mean that a query with _just_ two
    SHOULD clauses (ie. only SHOULD clauses) must contain one of the
    clauses, or will the BooleanQuery described above actually constrain
    the search results to (A AND B) AND (B OR C OR D)? If so, what should
    I use instead?

    thank you,
    Peter

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at May 22, 2007 at 7:40 am
    : Each doc is going to get a score -- if the score is positive the doc
    : will be a hit, if the score is 0 the doc will not be a hit.

    that's actually a fairly missleading statement ... the guts of Lucene
    doesn't prevent documents from "matching" with a negative score
    (specificly: a HitCollector can be asked to collect a match with a
    negative score)

    (dropping matches with negative scores only happens in the Hits
    class/collector as i recall)




    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at May 22, 2007 at 9:05 am
    Sorry, didn't mean to imply that that whole spiel was a technical
    explanation...just a "how I like to think of it" to get my head around
    the BooleanQuery system. If your reading that, think high level overview
    more than technically accurate. I'll be more specific in the future --
    as always, the javadocs are the best place to get down to the nitty gritty.

    HitCollector:
    /** Called once for every non-zero scoring document, with the document
    number
    * and its score.

    TopDocCollector (used by Hits and returned by a Searcher) does ensure
    scores are greater than 0. If you roll your own HitCollector, you
    shouldn't need my thoughts on how I think of BooleanQuery's.

    - Mark

    Chris Hostetter wrote:
    : Each doc is going to get a score -- if the score is positive the doc
    : will be a hit, if the score is 0 the doc will not be a hit.

    that's actually a fairly missleading statement ... the guts of Lucene
    doesn't prevent documents from "matching" with a negative score
    (specificly: a HitCollector can be asked to collect a match with a
    negative score)

    (dropping matches with negative scores only happens in the Hits
    class/collector as i recall)




    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Paul Elschot at May 22, 2007 at 4:00 pm
    This is actually more for java-dev, but anyway.
    On Tuesday 22 May 2007 11:04, Mark Miller wrote:
    Sorry, didn't mean to imply that that whole spiel was a technical
    explanation...just a "how I like to think of it" to get my head around
    the BooleanQuery system. If your reading that, think high level overview
    more than technically accurate. I'll be more specific in the future --
    as always, the javadocs are the best place to get down to the nitty gritty.

    HitCollector:
    /** Called once for every non-zero scoring document, with the document
    number
    * and its score.

    TopDocCollector (used by Hits and returned by a Searcher) does ensure
    scores are greater than 0. If you roll your own HitCollector, you
    shouldn't need my thoughts on how I think of BooleanQuery's.
    Among others, this javadoc is corrected by the patch here:
    http://issues.apache.org/jira/browse/LUCENE-584
    It introduces Matcher as a superclass of Scorer.

    Regards,
    Paul Elschot

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Soeren Pekrul at May 21, 2007 at 8:46 am
    Peter Bloem wrote:
    [...]
    "+(A B) C D E" [...]
    In other words, Lucene considers all documents that
    have both A and B, and ranks them higher if they also have C D or E.
    Hello Peter,

    for my understanding "+(A B) C D E" means at least one of the terms "A"
    or "B" must be contained and the terms "C", "D", and "E" are optional.
    The following documents d are hits:
    d(A, B)
    d(A)
    d(B)
    d(A, C)
    ...
    Documents without "A" and "B" are not a hit.

    To have both terms "A" and "B" in a document the query should be: "(+A
    +B) C D E" or "+A +B C D E".

    Sören



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at May 22, 2007 at 7:52 am
    : BooleanQuery.Occur.SHOULD for C, D and E. However the javadocs for
    : BooleanClause.Occur.SHOULD states:
    :
    : "Use this operator for clauses that /should/ appear in the matching
    : documents. For a BooleanQuery with two |SHOULD| subqueries, at least one
    : of the clauses must appear in the matching documents."

    Yeah, that's missleading... i've commited an updte that reads...

    /** Use this operator for clauses that <i>should</i> appear in the
    * matching documents. For a BooleanQuery with no <code>MUST</code>
    * clauses one or more <code>SHOULD</code> clauses must match a document
    * for the BooleanQuery to match.
    * @see BooleanQuery#setMinimumNumberShouldMatch
    */
    public static final Occur SHOULD = new Occur("SHOULD");

    ...does that make more sense?



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 21, '07 at 12:38a
activeMay 22, '07 at 4:00p
posts7
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase