Grokbase Groups Lucene dev March 2006
FAQ
Current implementation of fuzzy and wildcard queries inappropriately implemented as Boolean query rewrites
----------------------------------------------------------------------------------------------------------

Key: LUCENE-524
URL: http://issues.apache.org/jira/browse/LUCENE-524
Project: Lucene - Java
Type: Improvement
Components: Search
Versions: 1.9
Reporter: Randy Puttick


The implementation of MultiTermQuery in terms of BooleanQuery introduces several problems:

1) Collisions with maximum clause limit on boolean queries which throws an exception. This is most problematic because it is difficult to ascertain in advance how many terms a fuzzy query or wildcard query might involve.

2) The boolean disjunctive scoring is not appropriate for either fuzzy or wildcard queries. In effect the score is divided by the number of terms in the query which has nothing to do with the relevancy of the results.

3) Performance of disjunctive boolean queries for large term sets is quite sub-optimal

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Randy Puttick (JIRA) at Mar 18, 2006 at 12:51 am
    [ http://issues.apache.org/jira/browse/LUCENE-524?page=all ]

    Randy Puttick updated LUCENE-524:
    ---------------------------------

    Attachment: MultiTermQuery.java
    MultiTermScorer.java

    Implements union operation on a priority queue and scores multi-term based on maximum over terms versus essentially the average
    Current implementation of fuzzy and wildcard queries inappropriately implemented as Boolean query rewrites
    ----------------------------------------------------------------------------------------------------------

    Key: LUCENE-524
    URL: http://issues.apache.org/jira/browse/LUCENE-524
    Project: Lucene - Java
    Type: Improvement
    Components: Search
    Versions: 1.9
    Reporter: Randy Puttick
    Attachments: MultiTermQuery.java, MultiTermScorer.java

    The implementation of MultiTermQuery in terms of BooleanQuery introduces several problems:
    1) Collisions with maximum clause limit on boolean queries which throws an exception. This is most problematic because it is difficult to ascertain in advance how many terms a fuzzy query or wildcard query might involve.
    2) The boolean disjunctive scoring is not appropriate for either fuzzy or wildcard queries. In effect the score is divided by the number of terms in the query which has nothing to do with the relevancy of the results.
    3) Performance of disjunctive boolean queries for large term sets is quite sub-optimal
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators:
    http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see:
    http://www.atlassian.com/software/jira


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Nov 12, 2008 at 11:58 pm
    [ https://issues.apache.org/jira/browse/LUCENE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647127#action_12647127 ]

    Mark Miller commented on LUCENE-524:
    ------------------------------------

    Anyone think this still has merit? It looks interesting to me but I'm not so sure its the right approach...
    Current implementation of fuzzy and wildcard queries inappropriately implemented as Boolean query rewrites
    ----------------------------------------------------------------------------------------------------------

    Key: LUCENE-524
    URL: https://issues.apache.org/jira/browse/LUCENE-524
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 1.9
    Reporter: Randy Puttick
    Priority: Minor
    Attachments: MultiTermQuery.java, MultiTermScorer.java


    The implementation of MultiTermQuery in terms of BooleanQuery introduces several problems:
    1) Collisions with maximum clause limit on boolean queries which throws an exception. This is most problematic because it is difficult to ascertain in advance how many terms a fuzzy query or wildcard query might involve.
    2) The boolean disjunctive scoring is not appropriate for either fuzzy or wildcard queries. In effect the score is divided by the number of terms in the query which has nothing to do with the relevancy of the results.
    3) Performance of disjunctive boolean queries for large term sets is quite sub-optimal
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Otis Gospodnetic (JIRA) at Nov 13, 2008 at 4:30 am
    [ https://issues.apache.org/jira/browse/LUCENE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647181#action_12647181 ]

    Otis Gospodnetic commented on LUCENE-524:
    -----------------------------------------

    Based on the description, yes. Doesn't this also sound a lot like that old Mark H's issue that you commented on earlier?

    Current implementation of fuzzy and wildcard queries inappropriately implemented as Boolean query rewrites
    ----------------------------------------------------------------------------------------------------------

    Key: LUCENE-524
    URL: https://issues.apache.org/jira/browse/LUCENE-524
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 1.9
    Reporter: Randy Puttick
    Priority: Minor
    Attachments: MultiTermQuery.java, MultiTermScorer.java


    The implementation of MultiTermQuery in terms of BooleanQuery introduces several problems:
    1) Collisions with maximum clause limit on boolean queries which throws an exception. This is most problematic because it is difficult to ascertain in advance how many terms a fuzzy query or wildcard query might involve.
    2) The boolean disjunctive scoring is not appropriate for either fuzzy or wildcard queries. In effect the score is divided by the number of terms in the query which has nothing to do with the relevancy of the results.
    3) Performance of disjunctive boolean queries for large term sets is quite sub-optimal
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Otis Gospodnetic (JIRA) at Nov 13, 2008 at 3:08 pm
    [ https://issues.apache.org/jira/browse/LUCENE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647181#action_12647181 ]

    otis edited comment on LUCENE-524 at 11/13/08 7:06 AM:
    -------------------------------------------------------------------

    Based on the description, yes. Doesn't this also sound a lot like that old Mark H's LUCENE-329 issue?


    was (Author: otis):
    Based on the description, yes. Doesn't this also sound a lot like that old Mark H's issue that you commented on earlier?

    Current implementation of fuzzy and wildcard queries inappropriately implemented as Boolean query rewrites
    ----------------------------------------------------------------------------------------------------------

    Key: LUCENE-524
    URL: https://issues.apache.org/jira/browse/LUCENE-524
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 1.9
    Reporter: Randy Puttick
    Priority: Minor
    Attachments: MultiTermQuery.java, MultiTermScorer.java


    The implementation of MultiTermQuery in terms of BooleanQuery introduces several problems:
    1) Collisions with maximum clause limit on boolean queries which throws an exception. This is most problematic because it is difficult to ascertain in advance how many terms a fuzzy query or wildcard query might involve.
    2) The boolean disjunctive scoring is not appropriate for either fuzzy or wildcard queries. In effect the score is divided by the number of terms in the query which has nothing to do with the relevancy of the results.
    3) Performance of disjunctive boolean queries for large term sets is quite sub-optimal
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Jan 4, 2009 at 2:52 pm
    [ https://issues.apache.org/jira/browse/LUCENE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660578#action_12660578 ]

    Mark Miller commented on LUCENE-524:
    ------------------------------------

    This patch is getting old and the code base has changed in this area. Would you like to submit another patch Randy? If not, I'd like to close this issue.

    (be sure to remove the Java 1.5 code if you update the patch)
    Current implementation of fuzzy and wildcard queries inappropriately implemented as Boolean query rewrites
    ----------------------------------------------------------------------------------------------------------

    Key: LUCENE-524
    URL: https://issues.apache.org/jira/browse/LUCENE-524
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 1.9
    Reporter: Randy Puttick
    Priority: Minor
    Attachments: MultiTermQuery.java, MultiTermScorer.java


    The implementation of MultiTermQuery in terms of BooleanQuery introduces several problems:
    1) Collisions with maximum clause limit on boolean queries which throws an exception. This is most problematic because it is difficult to ascertain in advance how many terms a fuzzy query or wildcard query might involve.
    2) The boolean disjunctive scoring is not appropriate for either fuzzy or wildcard queries. In effect the score is divided by the number of terms in the query which has nothing to do with the relevancy of the results.
    3) Performance of disjunctive boolean queries for large term sets is quite sub-optimal
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 6, 2009 at 8:21 pm
    [ https://issues.apache.org/jira/browse/LUCENE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mark Miller closed LUCENE-524.
    ------------------------------

    Resolution: Fixed

    I'm going to say its time to close this with all of the changes/improvements that have gone on here. Most of the issues brought up are addressed in a different manner. Further changes should be brought up in a new issue.
    Current implementation of fuzzy and wildcard queries inappropriately implemented as Boolean query rewrites
    ----------------------------------------------------------------------------------------------------------

    Key: LUCENE-524
    URL: https://issues.apache.org/jira/browse/LUCENE-524
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 1.9
    Reporter: Randy Puttick
    Priority: Minor
    Attachments: MultiTermQuery.java, MultiTermScorer.java


    The implementation of MultiTermQuery in terms of BooleanQuery introduces several problems:
    1) Collisions with maximum clause limit on boolean queries which throws an exception. This is most problematic because it is difficult to ascertain in advance how many terms a fuzzy query or wildcard query might involve.
    2) The boolean disjunctive scoring is not appropriate for either fuzzy or wildcard queries. In effect the score is divided by the number of terms in the query which has nothing to do with the relevancy of the results.
    3) Performance of disjunctive boolean queries for large term sets is quite sub-optimal
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedMar 18, '06 at 12:48a
activeDec 6, '09 at 8:21p
posts7
users1
websitelucene.apache.org

1 user in discussion

Mark Miller (JIRA): 7 posts

People

Translate

site design / logo © 2021 Grokbase