Grokbase Groups Lucene dev May 2009
FAQ
Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
-----------------------------------------------------------------------------------

Key: LUCENE-1644
URL: https://issues.apache.org/jira/browse/LUCENE-1644
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Michael McCandless
Priority: Minor
Fix For: 2.9


When MultiTermQuery is used (via one of its subclasses, eg
WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
"constant score mode", which pre-builds a filter and then wraps that
filter as a ConstantScoreQuery.

If you don't set that, it instead builds a [potentially massive]
BooleanQuery with one SHOULD clause per term.

There are some limitations of this approach:

* The scores returned by the BooleanQuery are often quite
meaningless to the app, so, one should be able to use a
BooleanQuery yet get constant scores back. (Though I vaguely
remember at least one example someone raised where the scores were
useful...).

* The resulting BooleanQuery can easily have too many clauses,
throwing an extremely confusing exception to newish users.

* It'd be better to have the freedom to pick "build filter up front"
vs "build massive BooleanQuery", when constant scoring is enabled,
because they have different performance tradeoffs.

* In constant score mode, an OpenBitSet is always used, yet for
sparse bit sets this does not give good performance.

I think we could address these issues by giving BooleanQuery a
constant score mode, then empower MultiTermQuery (when in constant
score mode) to pick & choose whether to use BooleanQuery vs up-front
filter, and finally empower MultiTermQuery to pick the best (sparse vs
dense) bit set impl.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Mark Miller (JIRA) at Jun 11, 2009 at 2:23 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718279#action_12718279 ]

    Mark Miller commented on LUCENE-1644:
    -------------------------------------

    I'm inclined to think we push to 3.0?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Priority: Minor
    Fix For: 2.9


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jun 11, 2009 at 12:31 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Fix Version/s: (was: 2.9)
    3.1

    I agree... moving out.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Priority: Minor
    Fix For: 3.1


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 20, 2009 at 6:27 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Attachment: LUCENE-1644.patch

    I'd really like to get this one in for 2.9. The API is still
    malleable because the constant-score rewrite mode of MultiTermQuery
    hasn't been released yet.

    Attached patch, that adds a MultiTermQuery.RewriteMethod parameter,
    with current values FILTER, SCORING_BOOLEAN_QUERY and
    CONSTANT_BOOLEAN_QUERY. This replaces the
    setConstantScoreRewrite(boolean) method.

    I also added javadocs noting that the two remaining multi-term queries
    that have not yet switched over to FILTER rewrite method
    (WildcardQuery and PrefixQuery) will switch over in 3.0. (LUCENE-1557
    is already open to make that switch.)

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 20, 2009 at 6:27 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Fix Version/s: (was: 3.1)
    2.9
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 20, 2009 at 6:29 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless reassigned LUCENE-1644:
    ------------------------------------------

    Assignee: Michael McCandless
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 21, 2009 at 2:44 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733445#action_12733445 ]

    Robert Muir commented on LUCENE-1644:
    -------------------------------------

    Mike, one question: couldn't you consider FILTER versus CONSTANT_BOOLEAN_QUERY an implementation detail? could lucene pick / switch over to the best one?

    I guess in my opinion, there was nothing wrong with the setConstantScoreRewrite() from an API perspective, but behind the scenes maybe lucene could be a bit smarter about how it actually accomplishes that?

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Jul 21, 2009 at 3:30 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733458#action_12733458 ]

    Mark Miller commented on LUCENE-1644:
    -------------------------------------

    Thats originally what I thought this issue was - the correct method would be chosen under the covers using a heuristic. Progress not perfection deal?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 21, 2009 at 10:09 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733563#action_12733563 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. couldn't you consider FILTER versus CONSTANT_BOOLEAN_QUERY an implementation detail? could lucene pick / switch over to the best one?

    Yeah I struggled with this. I completely agree it's an impl detail -- the user should just have to say "I want constant scoring" and Lucene finds the most performant way to achieve it.

    But then I realized it's not obvious when one impl should be chosen over another. Often FILTER is faster than CONSTANT_BOOLEAN_QUERY, but at some point once the index becomes large enough the underlying O(maxDoc) cost (w/ small constant in front) of FILTER will dominate, or if the number of terms/docs that match is small then CONSTANT_BOOLEAN_QUERY will win, etc. If number of terms exceeds BooleanQuery's maxClauseCount, you must use FILTER.

    And my intuitions weren't right (I had thought NumericRangeQuery, since in general it doesn't produce too many terms, would perform well with BOOLEAN_QUERY, but from Uwe's numbers that's not the case; though we should re-test now that, with this patch, no CPU is spent on scoring).

    So, I was uncomfortable trying to make Lucene too smart under the hood, at least for this first go at it.

    Maybe we could add an AUTO option, that would make try to decide what's best? This way if we mess up its smarts, the user can still fallback and force one method over another.

    (Though, since the maxClauseCount is so clearly a dead-end, maybe even in CONSTANT_BOOLEAN_QUERY mode we should forcefully fallback to FILTER on hitting too many terms).
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 21, 2009 at 10:31 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733578#action_12733578 ]

    Robert Muir commented on LUCENE-1644:
    -------------------------------------

    Mike, I see your point, I believe two things are involved, but I could be wrong on this!

    In the tests I have run, for "uncached" queries that match only a few docs, query is faster.
    FILTER is faster if things are cached or if the query matches many documents, even for a huge index.

    I don't think this disagrees with Uwe's numbers really, I just think the OS cache etc plays a big part, at least thats what I thought I was seeing on my end...

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 21, 2009 at 10:50 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733582#action_12733582 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    Your results seem to agree w/ Uwe's (once things are "hot" then FILTER is even faster, even on a rather large index). Uwe's results were on a 5M doc index. How large is your index?

    OK so how about I add an CONSTANT_AUTO mode that tries to pick? I can add [expert] setters that you can use to set the cutoffs. Then in the javadocs, strongly encourage CONSTANT_AUTO, and state that in 3.0 it'll be the default?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 21, 2009 at 11:20 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733593#action_12733593 ]

    Robert Muir commented on LUCENE-1644:
    -------------------------------------

    Mike: > 100M docs. it does not fit in os cache :)

    I really should rerun tests since its been quite some time.

    What do you think about CONSTANT_AUTO doing your maxClauseCount trick?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 21, 2009 at 11:34 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733598#action_12733598 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. Mike: > 100M docs. it does not fit in os cache

    Heh :) OK.

    bq. What do you think about CONSTANT_AUTO doing your maxClauseCount trick?

    I'm thinking there'd be a cutoff on the number of terms. EG maybe it defaults to 25 (?). So, if more than 25 terms are encountered, we'll use FILTER; else we use BOOLEAN_QUERY. And, then, at all times I'd cutover to FILTER if that threshold exceeds the maxClauseCount.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 21, 2009 at 2:22 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733637#action_12733637 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    Maybe, we simply go back to the boolean (constant scoring or not), and then add this term count cutoff to control when a filter is used vs BooleanQuery?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 21, 2009 at 4:07 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676 ]

    Robert Muir commented on LUCENE-1644:
    -------------------------------------

    Mike, I am afraid that might hurt some people's performance.
    I'm a bit concerned my index/queries are maybe abnormal and don't want to break the general case.

    I'm not too familiar with trie [what it would do with a really general range query], but a simpler example would be no stopwords, wildcard query of th*
    maybe it only matches one term, but that term is very common / dense bitset and probably "hot".

    In this case the filter would be better, even though its 1 term.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 21, 2009 at 4:13 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676 ]

    Robert Muir edited comment on LUCENE-1644 at 7/21/09 9:14 AM:
    --------------------------------------------------------------

    Mike, I am afraid that might hurt some people's performance.
    I'm a bit concerned my index/queries are maybe abnormal and don't want to break the general case.

    I'm not too familiar with trie [what it would do with a really general range query], but a simpler example would be no stopwords, wildcard query of "th?" (matching "the")
    maybe it only matches one term, but that term is very common / dense bitset and probably "hot".

    In this case the filter would be better, even though its 1 term.

    was (Author: rcmuir):
    Mike, I am afraid that might hurt some people's performance.
    I'm a bit concerned my index/queries are maybe abnormal and don't want to break the general case.

    I'm not too familiar with trie [what it would do with a really general range query], but a simpler example would be no stopwords, wildcard query of th*
    maybe it only matches one term, but that term is very common / dense bitset and probably "hot".

    In this case the filter would be better, even though its 1 term.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller at Jul 21, 2009 at 4:50 pm
    It would be great to get some repeatable tests for this type of thing into
    the benchmark contrib. I had started work on that sometime back, but I don't
    think I have it around anymore.
    On Tue, Jul 21, 2009 at 12:14 PM, Robert Muir (JIRA) wrote:


    [
    https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676]

    Robert Muir edited comment on LUCENE-1644 at 7/21/09 9:14 AM:
    --------------------------------------------------------------

    Mike, I am afraid that might hurt some people's performance.
    I'm a bit concerned my index/queries are maybe abnormal and don't want to
    break the general case.

    I'm not too familiar with trie [what it would do with a really general
    range query], but a simpler example would be no stopwords, wildcard query of
    "th?" (matching "the")
    maybe it only matches one term, but that term is very common / dense bitset
    and probably "hot".

    In this case the filter would be better, even though its 1 term.

    was (Author: rcmuir):
    Mike, I am afraid that might hurt some people's performance.
    I'm a bit concerned my index/queries are maybe abnormal and don't want to
    break the general case.

    I'm not too familiar with trie [what it would do with a really general
    range query], but a simpler example would be no stopwords, wildcard query of
    th*
    maybe it only matches one term, but that term is very common / dense bitset
    and probably "hot".

    In this case the filter would be better, even though its 1 term.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery
    under the hood

    -----------------------------------------------------------------------------------
    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    --
    --
    - Mark

    http://www.lucidimagination.com
  • Robert Muir at Jul 21, 2009 at 5:01 pm
    Mark I agree, plus multitermquery is pretty complex to benchmark wrt this issue

    Because the behavior of someone using it for Trie is really different
    than someone like me using it for natural language.
    it will probably fit a different distribution (maybe not zipf) and
    creating a heuristic to meet everyone's needs seems pretty tricky to
    me...

    maybe not impossible, I just wanted to raise the question to see ideas
    that might prevent possible back-and-forth between
    .setConstantScoreRewrite and .setRewriteMethod...

    On Tue, Jul 21, 2009 at 12:50 PM, Mark Millerwrote:
    It would be great to get some repeatable tests for this type of thing into
    the benchmark contrib. I had started work on that sometime back, but I don't
    think I have it around anymore.
    On Tue, Jul 21, 2009 at 12:14 PM, Robert Muir (JIRA) wrote:

    [
    https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676
    ]

    Robert Muir edited comment on LUCENE-1644 at 7/21/09 9:14 AM:
    --------------------------------------------------------------

    Mike, I am afraid that might hurt some people's performance.
    I'm a bit concerned my index/queries are maybe abnormal and don't want to
    break the general case.

    I'm not too familiar with trie [what it would do with a really general
    range query], but a simpler example would be no stopwords, wildcard query of
    "th?" (matching "the")
    maybe it only matches one term, but that term is very common / dense
    bitset and probably "hot".

    In this case the filter would be better, even though its 1 term.

    was (Author: rcmuir):
    Mike, I am afraid that might hurt some people's performance.
    I'm a bit concerned my index/queries are maybe abnormal and don't want to
    break the general case.

    I'm not too familiar with trie [what it would do with a really general
    range query], but a simpler example would be no stopwords, wildcard query of
    th*
    maybe it only matches one term, but that term is very common / dense
    bitset and probably "hot".

    In this case the filter would be better, even though its 1 term.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery
    under the hood

    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back.  (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    --
    --
    - Mark

    http://www.lucidimagination.com


    --
    Robert Muir
    rcmuir@gmail.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 21, 2009 at 6:19 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733769#action_12733769 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. In this case the filter would be better, even though its 1 term.

    You're right. So... maybe we could also take the net doc count into account? Loading the TermInfo shouldn't be costly, since it's loaded anyway when the query/filter runs (and, it's cached). We'd then sum it up as we're checking, and if it crosses the threshold (default maybe 10% of maxDoc()?) we'd cut out to FILTER?

    Though, if we go this route, we should probably do it under a new CONSTANT_AUTO method (instead of falling back to the clean single boolean constantScoreRewrite), ie, it's getting a little too smart to just always do w/ no way to fallback and force it to just rewrite one way or another.

    Or we can punt on any smarts altogether for now (this being the "eve" of 2.9) and simply start with the three explicit rewrite methods.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 21, 2009 at 6:43 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733775#action_12733775 ]

    Robert Muir commented on LUCENE-1644:
    -------------------------------------

    Mike, that heuristic sounds really promising to me.

    at the same time, starting to think your .setMultiTermRewriteMethod really is the right way to go (regardless of whether it immediately has AUTO)...

    the thing i never really liked about .setConstantScoreRewrite is that its kinda misleading, since it really does more than just toggle scoring...

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Jul 21, 2009 at 6:55 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733781#action_12733781 ]

    Mark Miller commented on LUCENE-1644:
    -------------------------------------

    bq. Or we can punt on any smarts altogether for now (this being the "eve" of 2.9) and simply start with the three explicit rewrite methods.

    +1 - unless you just want to pump it out, we want to make the current change anyway I think, so lets leave it at that and add an auto later.

    We should get 2.9 out sooner rather than later I think.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 21, 2009 at 7:09 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733786#action_12733786 ]

    Robert Muir commented on LUCENE-1644:
    -------------------------------------

    Mike, sorry to flip back and forth on this since last night.

    The patch you have here does make things less confusing and I think that after talking things thru, the "big boolean option" with heuristics might actually make things more confusing, even if its seems convenient at first.

    in the future the AUTO would be a good feature I think, best of both worlds.

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 21, 2009 at 9:22 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Attachment: LUCENE-1644.patch

    Attached rough patch -- javadocs are missing/not
    updated, need to add new tests, need to fix QueryParser.jj, etc., but
    all tests pass.

    Here's what I did:

    - Changed the MTQ.RewriteMethod class from a simple Parameter to its
    own abstract base class w/ a single method, rewrite, which
    MultiTermQuery.rewrite delegates to.

    - Switched over CONSTANT_SCORE_FILTER_REWRITE,
    SCORING_BOOLEAN_QUERY_REWRITE and
    CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE. These classes are private
    (they have no configuration), and I created final static singleton
    instances for them.

    - Created ConstantScoreAutoRewrite (and the default
    CONSTANT_SCORE_AUTO_REWRITE instance) that you can configure based
    on term count & doc count, as to when it cuts over to
    CONSTANT_SCORE_FILTER_REWRITE.

    This approach also has the benefit of allowing customization entirely,
    if needed, of the "rewrite strategy", if none of the 4 choices work
    for you.


    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 22, 2009 at 10:36 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734067#action_12734067 ]

    Uwe Schindler commented on LUCENE-1644:
    ---------------------------------------

    Sorry that I came back too late to this issue, I am in holidays at the moment.

    In my opinion, the Parameter instead of boolean is a good idea. The latest patch is also a good idea, I only hve some small problems with it:
    - Why did you make so many internal things public? The additional ctor to MultiTermQueryrapperFilter should be package-private or protected (the class is not abstract, but should be used like abstract, so it ,must have only protected ctors). Only the public instances TermRangeFilter should have public ctors.
    - getFilter()/getEnum should stay protected.
    - I do not like the wired caching of Terms. A more cleaner API would be a new class CachingFilteredTermEnum, that can turn on caching for e.g. the first 20 terms and then reset. In this case, the API would stay clear and the filter code does not need to be changed at all (it just harvests the TermEnum, if it is cached or not). I would propose something like: new CachingFilteredTermEnum(originalEnum), use it normally, then termEnum.reset() to consume again and termEnum.purgeCache() if caching no longer needed and to be switched off (after the first 25 terms or so). The problem with MultiTermQueryWrapper filter is, that the filter is normally stateless (no reader or termenum). So normally the method getDocIdSet() should get the termenum or wrapper in addition to the indexreader. This is not very good (it took me some time, to understand, what you are doing).
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 22, 2009 at 10:48 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734070#action_12734070 ]

    Uwe Schindler commented on LUCENE-1644:
    ---------------------------------------

    The biggest problem is, that this caching gets completely wired with multi-segment indexes:
    The rewriting is done on the top-level reader. In this case the boolean query would be built and the terms cached. If there are too many terms, it creates a filter instance with the cached terms.
    The rewritten query is then executed against all sub-readers using the cached terms and a fixed term enum. Normally this would create a docidset for the current index reader, the rewrite did it for the top-level index reader -> the wron doc ids are returned and so on. So you cannot reuse the collected terms from the rewrite operation in the getDocIdSet calls.

    So please turn of this caching at all! As noted before, the important thing is, that the returned filter by rewrite is stateless and should not know anythis about index readers. The idex reader is passed in getDocIdSet any is different for non-optimized indexes.

    You have seen no tests fail, because all RangeQuery tests use optimized indexes.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 22, 2009 at 12:42 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734096#action_12734096 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. The biggest problem is, that this caching gets completely wired with multi-segment indexes

    Right, I caught this as well (there is one test that fails when I forcefully swap in constant-boolean-query as the constant score method), and I'm now turning off the caching.

    I've fixed it locally -- will post a new rev soon.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 22, 2009 at 4:48 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Attachment: LUCENE-1644.patch

    Attached patch: fixed some bugs in the last rev, updated test cases,
    javadocs, CHANGES. I also optimized MultiTermQueryWrapperFilter to
    use the bulk-read API from termDocs.

    I confirmed all tests pass if I temporarily switch
    CONSTANT_SCORE_FILTER_REWRITE to CONSTANT_SCORE_AUTO_REWRITE_DEFAULT.

    I changed QueryParser to use CONSTANT_SCORE_AUTO for rewrite (it was
    previously CONSTANT_FILTER).

    I still need to run some perf tests to get a rough sense of decent
    defaults for CONSTANT_SCORE_AUTO cutover thresholds.

    bq. getFilter()/getEnum should stay protected.

    OK I made getEnum protected again.

    I had tentatively made it public so that one could create their own
    [external] rewrite methods. But I think (if we leave it protected),
    one could still make an inner/nested class that can access getEnum().

    Do we even need getFilter()? I removed it in the patch.

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 22, 2009 at 8:57 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734301#action_12734301 ]

    Uwe Schindler commented on LUCENE-1644:
    ---------------------------------------

    Hi Mike,

    patch looks good. I was a little bit confused about the high term number cut off, but it is using Math.max to limit it to the current BooleanQuery max clause count.

    Some small things:

    bq. OK I made getEnum protected again.

    ...but only in MultiTermQuery itsself. Everywhere else (even in the backwards compatibility override test [JustCompile] it is public).

    Also the current singletons are not really singletons, because queries that are unserialized will contain instances that are not the "singleton" instances :) - and will therefore fail to produce correct hashcode/equals tests. The problem behind: The singletons are serializable but do not return itsself in readResolve() (not implemented). All singletons that are serializable must implement readResolve and return the singleton instance (see Parameter base class or the parser singletons in FieldCache).
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 22, 2009 at 8:57 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734301#action_12734301 ]

    Uwe Schindler edited comment on LUCENE-1644 at 7/22/09 1:38 PM:
    ----------------------------------------------------------------

    Hi Mike,

    patch looks good. I was a little bit confused about the high term number cut off, but it is using Math.max to limit it to the current BooleanQuery max clause count.

    Some small things:

    bq. OK I made getEnum protected again.

    ...but only in MultiTermQuery itsself. Everywhere else (even in the backwards compatibility override test [JustCompile] it is public).

    Also the current singletons are not really singletons, because queries that are unserialized will contain instances that are not the "singleton" instances :) - and will therefore fail to produce correct hashcode/equals tests. The problem behind: The singletons are serializable but do not return itsself in readResolve() (not implemented). All singletons that are serializable must implement readResolve and return the singleton instance (see Parameter base class or the parser singletons in FieldCache).

    The instance in the default Auto RewriteMethod is still modifiable. Is this correct? So one could modify the defaults by setting properties in this instance. Is this correct?

    was (Author: thetaphi):
    Hi Mike,

    patch looks good. I was a little bit confused about the high term number cut off, but it is using Math.max to limit it to the current BooleanQuery max clause count.

    Some small things:

    bq. OK I made getEnum protected again.

    ...but only in MultiTermQuery itsself. Everywhere else (even in the backwards compatibility override test [JustCompile] it is public).

    Also the current singletons are not really singletons, because queries that are unserialized will contain instances that are not the "singleton" instances :) - and will therefore fail to produce correct hashcode/equals tests. The problem behind: The singletons are serializable but do not return itsself in readResolve() (not implemented). All singletons that are serializable must implement readResolve and return the singleton instance (see Parameter base class or the parser singletons in FieldCache).
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 22, 2009 at 8:58 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734301#action_12734301 ]

    Uwe Schindler edited comment on LUCENE-1644 at 7/22/09 1:50 PM:
    ----------------------------------------------------------------

    Hi Mike,

    patch looks good. I was a little bit confused about the high term number cut off, but it is using Math.max to limit it to the current BooleanQuery max clause count.

    Some small things:

    bq. OK I made getEnum protected again.

    ...but only in MultiTermQuery itsself. Everywhere else (even in the backwards compatibility override test [JustCompile] it is public). And the same should be for the incNumberOfTerms (also protected). I think the rewrite method is internal to MultiTermQuery and always implemented ina subclass of MTQ as inner class.

    Also the current singletons are not really singletons, because queries that are unserialized will contain instances that are not the "singleton" instances :) - and will therefore fail to produce correct hashcode/equals tests. The problem behind: The singletons are serializable but do not return itsself in readResolve() (not implemented). All singletons that are serializable must implement readResolve and return the singleton instance (see Parameter base class or the parser singletons in FieldCache).

    The instance in the default Auto RewriteMethod is still modifiable. Is this correct? So one could modify the defaults by setting properties in this instance. Is this correct?

    was (Author: thetaphi):
    Hi Mike,

    patch looks good. I was a little bit confused about the high term number cut off, but it is using Math.max to limit it to the current BooleanQuery max clause count.

    Some small things:

    bq. OK I made getEnum protected again.

    ...but only in MultiTermQuery itsself. Everywhere else (even in the backwards compatibility override test [JustCompile] it is public).

    Also the current singletons are not really singletons, because queries that are unserialized will contain instances that are not the "singleton" instances :) - and will therefore fail to produce correct hashcode/equals tests. The problem behind: The singletons are serializable but do not return itsself in readResolve() (not implemented). All singletons that are serializable must implement readResolve and return the singleton instance (see Parameter base class or the parser singletons in FieldCache).

    The instance in the default Auto RewriteMethod is still modifiable. Is this correct? So one could modify the defaults by setting properties in this instance. Is this correct?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 23, 2009 at 12:37 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734411#action_12734411 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. I was a little bit confused about the high term number cut off,

    Sorry I still need to do some perf testing to pick an appropriate
    default here.

    bq. Everywhere else (even in the backwards compatibility override test [JustCompile] it is public). And the same should be for the incNumberOfTerms (also protected).

    Woops -- I'll fix. Thanks for catching even though you're on
    "vacation" ;)

    bq. Also the current singletons are not really singletons, because queries that are unserialized will contain instances that are not the "singleton" instances

    Sigh. I'll do what FieldCache's parser singletons do.

    bq. The instance in the default Auto RewriteMethod is still modifiable. Is this correct?

    I was thinking this was OK, ie, you could set the default cutoffs for
    anything that used the AUTO_DEFAULT. But it is static (global), so
    that's not great. I guess I'll make it an anonymous subclass of
    ConstantScoreAutoRewrite that disallows changes.

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 23, 2009 at 12:46 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Attachment: LUCENE-1644.patch

    New patch attached w/ above fixes plus some javadoc fixes. It has
    some nocommits which I'll clean up before committing.

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 23, 2009 at 11:40 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Attachment: LUCENE-1644.patch

    New patch attached. I think it's ready to commit!

    I ran a series of simple tests, using a 20.0 million doc Wikipedia
    index. I tested w/ PrefixQuery, using different prefixes to tickle the
    different number of matching terms vs matching docs.

    I first pursued the "matches many terms but few docs" case, and found
    at around 350 terms the filter method becomes faster.

    Then, I fixed the number of terms at 350 (modified PrefixQuery to
    pretend the enum stopped there) and tested different number of "doc
    visit counts" (sum of docFreq of each term) and found ~ 0.1% (1/1000)
    of the maxDoc() was the cutover.

    I also switched NumericRange* to use CONSTANT_SCORE_AUTO by default.

    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 23, 2009 at 12:02 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734570#action_12734570 ]

    Uwe Schindler commented on LUCENE-1644:
    ---------------------------------------

    Looks good, Mike.

    I think NumericRangeQuery should also be swiched to auto mode, you are right. My perf test was a little bit unfair, because it used a 5 Mio index with random integers. The queries were also random and the sum of docs/index size was about 1/3 (because of the random query). So most quries hit abou one third of all docs. In this case, always the filter is faster. For very small ranges with few terms, it may be really good to use

    A good thing would also be to set the mode to filter automatically, if precisionStep >6 for longs (valSize=64) and precStep > 8 for ints (valSize=32), because here the number of terms is often too big.

    One bug in ConstantScoreRangeQuery: You set the default to AUTO, the method to prevent changing this is wrong:
    {code:java}
    /** Changes of mode are not supported by this class (fixed to constant score rewrite mode) */
    - public void setConstantScoreRewrite(boolean constantScoreRewrite) {
    - if (!constantScoreRewrite)
    - throw new UnsupportedOperationException("Use TermRangeQuery instead to enable boolean query rewrite.");
    + public void setRewriteMethod(RewriteMethod method) {
    + if (method != CONSTANT_SCORE_FILTER_REWRITE) {
    + throw new UnsupportedOperationException("Use TermRangeQuery instead to change the rewrite method.");
    + }
    }
    {code}
    I would change this to simply always throw UOE on any change in ConstantScoreRangeQuery.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 23, 2009 at 12:04 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734571#action_12734571 ]

    Uwe Schindler commented on LUCENE-1644:
    ---------------------------------------

    Another question: Maybe it would be good to change the FieldCache to also use the buffered TermDocs variant in the Uninverter code?
    Has done anybody perf tests here. It would be easy to change this.
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 23, 2009 at 12:48 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734583#action_12734583 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. I would change this to simply always throw UOE on any change in ConstantScoreRangeQuery.

    OK will do.

    bq. A good thing would also be to set the mode to filter automatically, if precisionStep >6 for longs (valSize=64) and precStep > 8 for ints (valSize=32), because here the number of terms is often too big.

    Will do.

    Hmm -- my last patch hit this test failure, after I had switched to CONSTANT_SCORE_AUTO for NumericRangeQuery:
    {code}
    [junit] NOTE: random seed of testcase 'testRandomTrieAndClassicRangeQuery_NoTrie' was: -2919237198484373178
    [junit] ------------- ---------------- ---------------
    [junit] Testcase: testRandomTrieAndClassicRangeQuery_NoTrie(org.apache.lucene.search.TestNumericRangeQuery32): FAILED
    [junit] Total number of terms should be equal for unlimited precStep expected:<668552> but was:<668584>
    [junit] junit.framework.AssertionFailedError: Total number of terms should be equal for unlimited precStep expected:<668552> but was:<668584>
    [junit] at org.apache.lucene.search.TestNumericRangeQuery32.testRandomTrieAndClassicRangeQuery(TestNumericRangeQuery32.java:266)
    [junit] at org.apache.lucene.search.TestNumericRangeQuery32.testRandomTrieAndClassicRangeQuery_NoTrie(TestNumericRangeQuery32.java:287)
    [junit] at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
    [junit]
    {code}
    I haven't dug into it yet...
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 23, 2009 at 1:16 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734590#action_12734590 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. Hmm - my last patch hit this test failure, after I had switched to CONSTANT_SCORE_AUTO for NumericRangeQuery:

    Heh -- so I tried to repeat the failure, and couldn't. The I remembered that nice random seed that was printed out, so I went to the test and wired that seed and BOOM the failure happened. Thank you Hoss ;)

    I found the failure -- I was just failing to incr the term count in the "auto uses BooleanQuery" case. New patch soon...
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 23, 2009 at 1:18 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1644:
    ---------------------------------------

    Attachment: LUCENE-1644.patch

    New patch. I'll commit in a day or two, though I'll wait first for LUCENE-1693 to go in (it'll conflict on QueryParser.jj/java).
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 23, 2009 at 1:24 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734592#action_12734592 ]

    Uwe Schindler commented on LUCENE-1644:
    ---------------------------------------

    You were faster. The problem was the missing update of term count. I first thought, that you maybe counting the terms two times (one time at the beginning and then again in the filter, if the auto mode switches). But you are now only updating the term count at the end when you build the boolean query.

    By the way: How about initializing the ArrayList not with the default size, but maybe termCount cutoff/2 or something like that?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 23, 2009 at 2:32 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734603#action_12734603 ]

    Michael McCandless commented on LUCENE-1644:
    --------------------------------------------

    bq. How about initializing the ArrayList not with the default size, but maybe termCount cutoff/2 or something like that?

    That makes me a bit nervous, eg if someone sets these limits to something immense?
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 25, 2009 at 12:04 am
    [ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless resolved LUCENE-1644.
    ----------------------------------------

    Resolution: Fixed

    Thanks Uwe!
    Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood
    -----------------------------------------------------------------------------------

    Key: LUCENE-1644
    URL: https://issues.apache.org/jira/browse/LUCENE-1644
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Michael McCandless
    Assignee: Michael McCandless
    Priority: Minor
    Fix For: 2.9

    Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch


    When MultiTermQuery is used (via one of its subclasses, eg
    WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
    "constant score mode", which pre-builds a filter and then wraps that
    filter as a ConstantScoreQuery.
    If you don't set that, it instead builds a [potentially massive]
    BooleanQuery with one SHOULD clause per term.
    There are some limitations of this approach:
    * The scores returned by the BooleanQuery are often quite
    meaningless to the app, so, one should be able to use a
    BooleanQuery yet get constant scores back. (Though I vaguely
    remember at least one example someone raised where the scores were
    useful...).
    * The resulting BooleanQuery can easily have too many clauses,
    throwing an extremely confusing exception to newish users.
    * It'd be better to have the freedom to pick "build filter up front"
    vs "build massive BooleanQuery", when constant scoring is enabled,
    because they have different performance tradeoffs.
    * In constant score mode, an OpenBitSet is always used, yet for
    sparse bit sets this does not give good performance.
    I think we could address these issues by giving BooleanQuery a
    constant score mode, then empower MultiTermQuery (when in constant
    score mode) to pick & choose whether to use BooleanQuery vs up-front
    filter, and finally empower MultiTermQuery to pick the best (sparse vs
    dense) bit set impl.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedMay 19, '09 at 1:05p
activeJul 25, '09 at 12:04a
posts41
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase