Grokbase Groups Lucene dev May 2011
FAQ
The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
------------------------------------------------------------------------------------------

Key: LUCENE-3068
URL: https://issues.apache.org/jira/browse/LUCENE-3068
Project: Lucene - Java
Issue Type: Bug
Components: Search
Affects Versions: 3.1, 3.0.3, 4.0
Reporter: Michael McCandless
Priority: Minor
Fix For: 3.2, 4.0


In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
matching docs that it shouldn't; but I think those changes caused it
to fail to match docs that it should, specifically when the doc itself
has tokens at the same position.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Michael McCandless (JIRA) at May 4, 2011 at 10:05 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-3068:
    ---------------------------------------

    Attachment: LUCENE-3068.patch

    Patch w/ test case showing the problem.

    If you set slop to 0 for the PhraseQuery, the test passes. The MultiPhraseQuery passes with slop or no slop because it handles the same-position case itself (Union*Enum).

    That got me thinking... maybe any time a *PhraseQuery has overlapping positions, we should rewrite to a MultiPhraseQuery and let it handle the same positions...? Is there any downside to that?
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 4, 2011 at 10:19 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doron Cohen reassigned LUCENE-3068:
    -----------------------------------

    Assignee: Doron Cohen
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 4, 2011 at 7:00 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028895#comment-13028895 ]

    Doron Cohen commented on LUCENE-3068:
    -------------------------------------

    bq. specifically when the doc itself has tokens at the same position.

    I am not convinced yet that there is a bug here - I think the code does allow this?

    There is another assumption in the code, that any two different PPs are in different TPs - which underlines the assumption that originally each PP differs in position, This seems a valid assumption, because QP will create MFQ if there are two terms in the (phrase) query with same position.

    bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite to a MultiPhraseQuery and let it handle the same positions...? Is there any downside to that?

    I think this is the correct behavior - in particular this will be the query that a QP will create. The only way to create a PQ (not MPQ) for PPs in same positions is to create it manually. But why would anyone do that? And they did, wouldn't such a rewrite be a surprise to them?

    A patch to follow with a revised version of this test - one that uses the QP. In this patch the QP indeed creates an MFQ, and I am yet unable to make it fail. Still trying.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 4, 2011 at 7:06 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doron Cohen updated LUCENE-3068:
    --------------------------------

    Attachment: LUCENE-3068.patch

    Attached modified version of the test - one that invokes the query parser to create an MFQ. The test passes.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 5, 2011 at 6:07 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029150#comment-13029150 ]

    Doron Cohen commented on LUCENE-3068:
    -------------------------------------

    This is more complex than I originally thought.

    # QueryParser creates a MultiplePhraseQuery (MPQ) when one of the (phrase) query positions is a multi-term.
    # MPQ has an implicit OR behavior - it is used for e.g. wildcarding a phrase query.
    # PhraseQuery (PQ) sloppy scorer assumes each query position has a single term.
    # PQ with several terms in same position cannot be created by parsing it with a QP, only manual.
    Manually created, it would have an AND semantics: only docs with ALL the terms in pos N should match.
    In other words, assume doc D terms and positions are:
    a:0 b:1 c:1 d:2
    MPQ for (a,b):0 d:1 should match D, finding the phrase b:1 d:2 (OR semantics)
    PQ for (a,b):0 d:1 should not match D, because it does not contain 'a' and 'b' in the same position (AND semantics).


    Therefore, rewriting PQ into MPQ is not a valid fix, because it would turn the AND logic assumed by creating the PQ this way, by an OR logic as assumed in MPQ.

    {code:title=TestPositionIncrement.testSetPosition has a test for this case exactly}
    // phrase query should fail for non existing searched term
    // even if there exist another searched terms in the same searched position.
    q = new PhraseQuery();
    q.add(new Term("field", "3"),0);
    q.add(new Term("field", "9"),0);
    hits = searcher.search(q, null, 1000).scoreDocs;
    assertEquals(0, hits.length);
    {code}

    Although QP by default will not create this PQ, I think we need to support it, for applications needing to be strict with the search results, with slop.

    So fixing this would need to take place inside SloppyScorer, digging further...
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 5, 2011 at 7:58 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doron Cohen updated LUCENE-3068:
    --------------------------------

    Attachment: LUCENE-3068.patch

    Attached patch fixes this bug by excluding fro the repeats check those PPs originated fro same offset in the query.

    This allows more strict phrase queries: strict on terms in same position (AND logic) but still sloppy.

    All tests pass, this is ready to go in (unless there are reservations).
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Shai Erera (JIRA) at May 5, 2011 at 8:24 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029229#comment-13029229 ]

    Shai Erera commented on LUCENE-3068:
    ------------------------------------

    Patch looks good to me.

    One comment about the test - perhaps use the LTC methods that do random tests, like newDirectory(), newIndexWriterConfig() etc.? If you don't think it's appropriate for this test, that's ok with me.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 5, 2011 at 10:44 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029274#comment-13029274 ]

    Doron Cohen commented on LUCENE-3068:
    -------------------------------------

    Thanks for reviewing Shai!
    I'll updated the patch with random newDirectory and newICFG - not the focus here, but may improve coverage anyhow,
    I added tests for the combined case - some AND some OR - that is, using MPQ, some add() with a single term (AND), some with an array longer than 1 (OR). Also refactored the tests a bit so that now there's a small test method for each test case.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 5, 2011 at 10:54 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doron Cohen updated LUCENE-3068:
    --------------------------------

    Attachment: LUCENE-3068.patch

    Patch with more test cases - AND/OR logic for MPQ is combined, and test code made simpler.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: Search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 18, 2011 at 3:01 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035422#comment-13035422 ]

    Doron Cohen commented on LUCENE-3068:
    -------------------------------------

    fixed in trunk in r1124293.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: core/search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 18, 2011 at 3:39 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doron Cohen resolved LUCENE-3068.
    ---------------------------------

    Resolution: Fixed

    fix merged to 3x in r1124302.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: core/search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 18, 2011 at 8:23 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035643#comment-13035643 ]

    Doron Cohen commented on LUCENE-3068:
    -------------------------------------

    I wonder if this should be fixed also in 3.1 branch?
    Probably so only if we make a 3.1.1, but not needed if its gonna be a 3.2.
    What's the best practice then? Reopen until decision?
    Or rely on rescanning all 3.2 changes in case its gonna be 3.1.1?
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: core/search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 19, 2011 at 10:39 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036107#comment-13036107 ]

    Doron Cohen commented on LUCENE-3068:
    -------------------------------------

    Looking at http://people.apache.org/~mikemccand/lucenebench/SloppyPhrase.html (Mike this is a great tool!) I see no particular slowdown at the last runs.

    A thought about these benchmarks, it would be helpful if the checked revision would be shown - perhaps as part of the hover text when hovering the mouse on a graph point...
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: core/search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at May 19, 2011 at 10:53 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036109#comment-13036109 ]

    Michael McCandless commented on LUCENE-3068:
    --------------------------------------------

    bq. A thought about these benchmarks, it would be helpful if the checked revision would be shown - perhaps as part of the hover text when hovering the mouse on a graph point..

    Good idea! I'll try to do this...

    Note that if you go back to the root page, and click on a given day, it tells you the svn rev and also hg ref (of luceneutil), so that's a [cumbersome] way to get the svn rev.
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: core/search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Doron Cohen (JIRA) at May 19, 2011 at 11:11 am
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036111#comment-13036111 ]

    Doron Cohen commented on LUCENE-3068:
    -------------------------------------

    bq. Note that if you go back to the root page, and click on a given day, it tells you the svn rev and also hg ref (of luceneutil)

    Great, thanks!

    So, this commit to trunk in r1124293 falls between these two:

    - Tue 17/05/2011 Lucene/Solr trunk rev 1104671
    - Wed 18/05/2011 Lucene/Solr trunk rev 1124524

    ... No measurable degradation, good!
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: core/search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer (JIRA) at May 19, 2011 at 12:13 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036140#comment-13036140 ]

    Simon Willnauer commented on LUCENE-3068:
    -----------------------------------------

    bq. Looking at http://people.apache.org/~mikemccand/lucenebench/SloppyPhrase.html (Mike this is a great tool!) I see no particular slowdown at the last runs.
    I love it! good that all the work on LuceneUtil pays off!!!!!
    The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
    ------------------------------------------------------------------------------------------

    Key: LUCENE-3068
    URL: https://issues.apache.org/jira/browse/LUCENE-3068
    Project: Lucene - Java
    Issue Type: Bug
    Components: core/search
    Affects Versions: 3.0.3, 3.1, 4.0
    Reporter: Michael McCandless
    Assignee: Doron Cohen
    Priority: Minor
    Fix For: 3.2, 4.0

    Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


    In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
    matching docs that it shouldn't; but I think those changes caused it
    to fail to match docs that it should, specifically when the doc itself
    has tokens at the same position.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedMay 4, '11 at 10:03a
activeMay 19, '11 at 12:13p
posts17
users1
websitelucene.apache.org

1 user in discussion

Simon Willnauer (JIRA): 17 posts

People

Translate

site design / logo © 2021 Grokbase