Grokbase Groups Lucene dev July 2008
FAQ
Allow Filter as clause to BooleanQuery
--------------------------------------

Key: LUCENE-1345
URL: https://issues.apache.org/jira/browse/LUCENE-1345
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Paul Elschot
Priority: Minor




--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Paul Elschot (JIRA) at Jul 23, 2008 at 9:47 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Elschot updated LUCENE-1345:
    ---------------------------------

    Attachment: LUCENE-1345.patch

    A first half attempt, it still leaves a few compile errors.

    This would allow faster Filter evaluation because the tight loop
    would be in ConjunctionScorer only.

    It would also allow to get rid of Filter in most of the search api,
    as any Filter can just be added to a BooleanQuery.

    Would anyone have a DisjunctionDISI (Disjunction over DocIdSetIterators) somewhere?

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: LUCENE-1345.patch

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks Dev (JIRA) at Jul 26, 2008 at 8:24 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Eks Dev updated LUCENE-1345:
    ----------------------------

    Attachment: DisjunctionDISI.patch

    bq. Would anyone have a DisjunctionDISI (Disjunction over DocIdSetIterators) somewhere?

    I have played with DisjunctionSumScorer rip-off, maybe you find it useful for this issue...

    What would be nice here(and in DisjunctionSumScorer ), if possible?:

    - to remove initDISIQueue() from next() and skipTo() (also the same in DisjunctionSumScorer()) ... this is due to this ugly -1 position before first call, I just do not know how to get rid of it :)

    - to switch to Conjuction "mode" if minNrShouldMatch kicks in.... there are already todo-s for it arround


    if you think you can use it, just go ahead and include it in your patch, I am not using this for anything, just wrapped it up when you asked.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, LUCENE-1345.patch

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jul 26, 2008 at 9:50 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617220#action_12617220 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    Thanks for the DisjunctionDISI.patch. I had just continued, but I hadn't come that far yet.
    I'll be off quite irregularly in the next month, I'll try and attach here when there's real progress.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, LUCENE-1345.patch

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks dev at Jul 26, 2008 at 10:29 pm
    Hi Paul,
    it sounds so familiar. I too like playing with lucene, makes fun, but I have not found formula to make 25 Hours day (waking up one hour earlier does not work for me for some strange reason)

    The only other person being so interested in this Filter-like issues is Yonik, but I guess he has also some big fish in Solr world to fry... Nobody is in hurry with this one, when it gets done, it will be finished ;)

    Anyway, I will have a look at what you did so far when I find a few hours to spare...

    cheers,
    eks


    ----- Original Message ----
    From: Paul Elschot (JIRA) <jira@apache.org>
    To: java-dev@lucene.apache.org
    Sent: Saturday, 26 July, 2008 11:50:31 PM
    Subject: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery


    [
    https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617220#action_12617220
    ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    Thanks for the DisjunctionDISI.patch. I had just continued, but I hadn't come
    that far yet.
    I'll be off quite irregularly in the next month, I'll try and attach here when
    there's real progress.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, LUCENE-1345.patch

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    __________________________________________________________
    Not happy with your email address?.
    Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks Dev (JIRA) at Jul 27, 2008 at 2:07 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Eks Dev updated LUCENE-1345:
    ----------------------------

    Attachment: DisjunctionDISI.patch

    I just realised TestDisjunctionDISI had a bug (iterators have to be reinitialized)...

    apart from that only small change in DISIQueue to use constants instead of vars (compiler should have done it as well, but you never know)

    private final void downHeap() {
    + int i = 1;
    + int j = 2; //i << 1; // find smaller child
    + int k = 3; //j + 1;
    +
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks Dev (JIRA) at Jul 28, 2008 at 9:57 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Eks Dev updated LUCENE-1345:
    ----------------------------

    Attachment: TestIteratorPerf.java

    Hi Paul,
    I gave it a try on micro benchmarking, and it looks like we could gain a lot by switcing to sentinel approach for iterators, apart for being faster they are also a bit robuster to "one off" bugs.

    This test is just a simulation made assuming docId is long (I have tried it with int and it is about the same result).

    Just attaching it here as I did not want to create new issue for now, before we identify if there are some design/performance knock-out criteria.

    test on my setup:
    32bit java version "1.6.0_10-rc"
    java(TM) SE Runtime Environment (build 1.6.0_10-rc-b28)
    Windows XP Profesional 32bit
    notebook, 3Gb RAM,
    CPU x86 Family 6 Model 15 Stepping 11 GenuineIntel ~2194 Mhz

    java -server -Xbatch


    result (with docID long):
    old milliseconds=6938
    old milliseconds=6953
    old milliseconds=6890
    old milliseconds=6938
    old milliseconds=6906
    old milliseconds=6922
    old milliseconds=6906
    old milliseconds=6938
    old milliseconds=6906
    old milliseconds=6906
    old total milliseconds=69203

    new milliseconds=5797
    new milliseconds=5703
    new milliseconds=5266
    new milliseconds=5250
    new milliseconds=5234
    new milliseconds=5250
    new milliseconds=5235
    new milliseconds=5250
    new milliseconds=5250
    new milliseconds=5250
    new total milliseconds=53485
    New/Old Time 53485/69203 (77.28711%)

    all in all, faster more than 22% !!

    Of course, this type of benchmark does not mean all iterator ops in real life are going to be 20% faster... other things probably dominate, but if it proves that this test does not have some flaws (easy possible)... well worth of pursuing

    cheers, eks



    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jul 28, 2008 at 10:26 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Elschot updated LUCENE-1345:
    ---------------------------------

    Attachment: LUCENE-1345.patch

    Patch of 20080729: all tests pass, but no tests cases for filter clauses yet.

    Added BooleanFilterClause class, usable as argument to BooleanQuery.add().

    API change: made ReqExclScorer package private, added an arg to the constructor.

    Removed the queueSize variable in DisjunctionSumScorer and in the added DisjunctionDISI. Left the doc caching in ScorerDocQueue and in the added DisiDocQueue.

    It might be possible to subclass DisjunctionSumScorer from DisjunctionDISI,
    and to subclass ScorerDocQueue from DisiDocQueue, I have not checked that.

    Since ConjunctionScorer can handle DocIdSetIterators with this patch, it should improve the speed for Filters when they are added to a BooleanQuery instead of being used as through the current search API.

    Eks, thanks for DisjunctionDISI, I took it a bit further.

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks Dev (JIRA) at Jul 28, 2008 at 10:40 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617603#action_12617603 ]

    Eks Dev commented on LUCENE-1345:
    ---------------------------------

    great! Will look into at at the weekend in more datails.

    I have moved this part to Constructor on my local copy, it passes all tests:

    + if (disiDocQueue == null) {
    + initDisiDocQueue();
    + }


    it is in next() and skipTo()....

    practically the same as reported in https://issues.apache.org/jira/browse/LUCENE-1145, with this, 1145 can be closed


    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jul 28, 2008 at 10:45 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617604#action_12617604 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    20090729 is the date here, the attachment is dated 20080728, never mind.

    As to the sentinel for doc()/next() in the TestIteraratorPerf patch: this will need some real Scorers/DocIdSetIterators to see actual JIT compiler inlining in both cases. In the patch, the Old and New classes are local private classes, which are much easier to inline than separate, (non final) public classes.

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jul 28, 2008 at 10:48 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617606#action_12617606 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    Indeed, it makes sense to add the changes from LUCENE-1145 here.
    I remembered some discussion about this, but not that there was an issue open...
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks dev at Jul 28, 2008 at 10:50 pm
    from what I can say, this just makes it harder for the new approach, but you newer know before you try it in "production" ...

    just wanted to see if it could lead anywhere before spending real time on it


    ----- Original Message ----
    From: Paul Elschot (JIRA) <jira@apache.org>
    To: java-dev@lucene.apache.org
    Sent: Tuesday, 29 July, 2008 12:44:31 AM
    Subject: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery


    [
    https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617604#action_12617604
    ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    20090729 is the date here, the attachment is dated 20080728, never mind.

    As to the sentinel for doc()/next() in the TestIteraratorPerf patch: this will
    need some real Scorers/DocIdSetIterators to see actual JIT compiler inlining in
    both cases. In the patch, the Old and New classes are local private classes,
    which are much easier to inline than separate, (non final) public classes.

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch,
    LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    __________________________________________________________
    Not happy with your email address?.
    Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley (JIRA) at Jul 29, 2008 at 12:49 am
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617631#action_12617631 ]

    Yonik Seeley commented on LUCENE-1345:
    --------------------------------------

    Eks: just for grins, you can sometimes save a single cycle by changing "id==-1" to "id<0" (many operations on x86 automatically set status flags, hence comparison to zero can often be free). Not sure if the java optimizer will catch it though, and if it does if it would actually rise above the noise level.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Chris Hostetter at Jul 29, 2008 at 1:13 am
    : Eks: just for grins, you can sometimes save a single cycle by changing
    : "id==-1" to "id<0" (many operations on x86 automatically set status

    can you save anymore if you use "0>id" ? :)


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks dev at Jul 29, 2008 at 7:05 am
    as a matter of fact, you can, keeping literals on left hand side prevents some ugly accidental assignments, so at the end of day you have more time to speed things up instead of chasing bugs :)

    cheers Hoss, god to see you are following this


    ----- Original Message ----
    From: Chris Hostetter <hossman_lucene@fucit.org>
    To: java-dev@lucene.apache.org
    Sent: Tuesday, 29 July, 2008 3:13:12 AM
    Subject: Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery


    : Eks: just for grins, you can sometimes save a single cycle by changing
    : "id==-1" to "id<0" (many operations on x86 automatically set status

    can you save anymore if you use "0>id" ? :)


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    __________________________________________________________
    Not happy with your email address?.
    Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks Dev (JIRA) at Jul 29, 2008 at 7:19 am
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617726#action_12617726 ]

    Eks Dev commented on LUCENE-1345:
    ---------------------------------

    Yonik, this would probably work fine for int values (on my CPU), I have tried it on long values and this was significantly slower on this test... it boils down again to "what is the CPU we are optimizing for" :)
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley (JIRA) at Jul 29, 2008 at 3:35 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617822#action_12617822 ]

    Yonik Seeley commented on LUCENE-1345:
    --------------------------------------

    bq. I have tried it on long values and this was significantly slower on this test.

    Huh... I bet it's the test. It's probably so simple that everything is inlined and the comparison with -1 is being optimized away entirely (since a compare instruction is the same speed... doesn't matter if one is checking for equality or for less).
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks Dev (JIRA) at Jul 29, 2008 at 4:21 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617836#action_12617836 ]

    Eks Dev commented on LUCENE-1345:
    ---------------------------------

    bq. comparison with -1 is being optimized away entirely

    I do not think so, how compiler could "optimize away" the only condition that stops the loop? The loop would never finish, or am I misreading something here?

    Anyhow, the test is so simple that compiler can take completely other direction from the real case. I guess much better test (without too much effort!) would be to take something like OpenBitSetIterator and make one Iterator implementation with sentinel approach and then compare... this test is really just a dumb loop, but on the other side isolates the difference between two approaches...


    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jul 29, 2008 at 9:19 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Elschot updated LUCENE-1345:
    ---------------------------------

    Attachment: DisjunctionDISI.java

    DisjunctionDisi.java of 20080729 has a first try of switching to conjunction mode, see advanceAfterCurrent and requiredDisis in there.

    I have not even compiled it yet, just showing the basic idea, no more time at the moment.

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks Dev (JIRA) at Jul 29, 2008 at 9:48 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Eks Dev updated LUCENE-1345:
    ----------------------------

    Attachment: OpenBitSetIteratorExperiment.java
    TestIteratorPerf.java

    I just enhanced TestIteratorPerf to work with OpenBitSetIterator(Experiment)... on dense bit sets sentinel based are faster (ca 9%), on low density about the same?

    Yonik's tip -1 < doc instead of -1 != doc still performs worse, and knowing Yonik's hunch on these things, I am still not convinced it is really faster ...

    Paul's work here is more interesting, clear API and Performance win on many fronts...

    practically, no need to pollute this issue more with iterator semantics if I(or someone else) figure out something really interesting there, will create new issue ....
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at Jul 30, 2008 at 6:57 pm
    disclaimer: this is just for fun.... differences should be in the
    noise in any complex system, and I'm not suggesting any code changes.

    Actually, with 32 bit registers, x<0 should be faster than x==-1 by
    one cycle. If it doesn't test faster, then it's because of some
    optimizations that could be pulled off via inlining and data flow
    analysis (set of -1 followed by a check for -1, etc).

    Here's the optimized code that gcc produces for a 64 bit long comparison:

    # if (foo() == -1)
    andl %edx, %eax
    incl %eax
    jne L2

    # if (foo() < 0)
    testl %edx, %edx
    jns L9

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at Jul 30, 2008 at 7:07 pm
    Neat!

    Do you know how to get the corresponding asm that the hostpot compiler
    produces? This way we can see if this difference "survives" through
    java...

    Mike

    Yonik Seeley wrote:
    disclaimer: this is just for fun.... differences should be in the
    noise in any complex system, and I'm not suggesting any code changes.

    Actually, with 32 bit registers, x<0 should be faster than x==-1 by
    one cycle. If it doesn't test faster, then it's because of some
    optimizations that could be pulled off via inlining and data flow
    analysis (set of -1 followed by a check for -1, etc).

    Here's the optimized code that gcc produces for a 64 bit long
    comparison:

    # if (foo() == -1)
    andl %edx, %eax
    incl %eax
    jne L2

    # if (foo() < 0)
    testl %edx, %edx
    jns L9

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at Jul 30, 2008 at 7:13 pm

    On Wed, Jul 30, 2008 at 3:06 PM, Michael McCandless wrote:
    Neat!

    Do you know how to get the corresponding asm that the hostpot compiler
    produces? This way we can see if this difference "survives" through java...
    Unfortunately, no. I've looked in the past and couldn't find anything.
    If anyone knows of anything, it would be very cool to have though.

    -Yonik
    Mike

    Yonik Seeley wrote:
    disclaimer: this is just for fun.... differences should be in the
    noise in any complex system, and I'm not suggesting any code changes.

    Actually, with 32 bit registers, x<0 should be faster than x==-1 by
    one cycle. If it doesn't test faster, then it's because of some
    optimizations that could be pulled off via inlining and data flow
    analysis (set of -1 followed by a check for -1, etc).

    Here's the optimized code that gcc produces for a 64 bit long comparison:

    # if (foo() == -1)
    andl %edx, %eax
    incl %eax
    jne L2

    # if (foo() < 0)
    testl %edx, %edx
    jns L9

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Stephen Green at Jul 30, 2008 at 7:17 pm

    On Jul 30, 2008, at 3:12 PM, Yonik Seeley wrote:

    On Wed, Jul 30, 2008 at 3:06 PM, Michael McCandless
    wrote:
    Neat!

    Do you know how to get the corresponding asm that the hostpot
    compiler
    produces? This way we can see if this difference "survives"
    through java...
    Unfortunately, no. I've looked in the past and couldn't find
    anything.
    If anyone knows of anything, it would be very cool to have though.
    Might the description here:

    http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html

    help?

    Steve
    --
    Stephen Green // Stephen.Green@sun.com
    Principal Investigator \\ http://blogs.sun.com/searchguy
    Aura Project // Voice: +1 781-442-0926
    Sun Microsystems Labs \\ Fax: +1 781-442-1692




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at Jul 30, 2008 at 7:34 pm

    On Wed, Jul 30, 2008 at 3:17 PM, Stephen Green wrote:
    Might the description here:

    http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html

    help?
    Sweet! Thanks!

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Stephen Green at Jul 31, 2008 at 12:49 pm

    On Jul 30, 2008, at 3:33 PM, Yonik Seeley wrote:

    On Wed, Jul 30, 2008 at 3:17 PM, Stephen Green
    wrote:
    Sweet! Thanks!
    Glad to help :-)

    Steve
    --
    Stephen Green // Stephen.Green@sun.com
    Principal Investigator \\ http://blogs.sun.com/searchguy
    Aura Project // Voice: +1 781-442-0926
    Sun Microsystems Labs \\ Fax: +1 781-442-1692




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at Jul 30, 2008 at 8:06 pm

    On Wed, Jul 30, 2008 at 3:06 PM, Michael McCandless wrote:

    Neat!

    Do you know how to get the corresponding asm that the hostpot compiler
    produces? This way we can see if this difference "survives" through java...
    Thanks to the tool that Stephen pointed out, I can now see that the
    difference does survive.

    -Yonik
    Yonik Seeley wrote:
    disclaimer: this is just for fun.... differences should be in the
    noise in any complex system, and I'm not suggesting any code changes.

    Actually, with 32 bit registers, x<0 should be faster than x==-1 by
    one cycle. If it doesn't test faster, then it's because of some
    optimizations that could be pulled off via inlining and data flow
    analysis (set of -1 followed by a check for -1, etc).

    Here's the optimized code that gcc produces for a 64 bit long comparison:

    # if (foo() == -1)
    andl %edx, %eax
    incl %eax
    jne L2

    # if (foo() < 0)
    testl %edx, %edx
    jns L9

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley (JIRA) at Jul 30, 2008 at 9:25 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618513#action_12618513 ]

    Yonik Seeley commented on LUCENE-1345:
    --------------------------------------

    Eks, I just tried your first TestIteratorPerf.java myself, and comparison with zero was faster (as expected) by about 8%
    I commented everything out except for testNew for simplicity.

    Original testNew:
    {code}
    $ java -server -cp . TestIteratorPerf
    new milliseconds=2883
    new milliseconds=3289
    new milliseconds=3148
    new milliseconds=3195
    new milliseconds=3149
    new milliseconds=3179
    new milliseconds=3180
    new milliseconds=3164
    new milliseconds=3179
    new milliseconds=3164
    new total milliseconds=31530
    {code}

    Modified testNew:
    // while(-1!=(doc=it.next())){
    while((doc=it.next()) >= 0)

    {code}
    $ java -server -cp . TestIteratorPerf
    new milliseconds=2806
    new milliseconds=2899
    new milliseconds=2915
    new milliseconds=2899
    new milliseconds=2914
    new milliseconds=2899
    new milliseconds=2899
    new milliseconds=3040
    new milliseconds=2899
    new milliseconds=2930
    new total milliseconds=29100
    {code}

    System: WinXP, Pentium4, java version "1.5.0_11"
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Eks dev at Jul 30, 2008 at 9:46 pm
    than we conclude, comparison with 0 is faster :)

    Maybe something on my XP machine was doing something in background I have not noticed, stealing cycles, on Windows this can not be easily controlled.
    or
    when I tested it the other day, I used comparison with -1
    while((doc=it.next()) >-1)

    could that make any difference? looks like!

    I just read mails here. Wow, I can dump asm now, easily! this is fun... I will have to dig out my old x86 references, must admit, very very rusty on CPU development in past years (10+ )... I used to be cool a long, long time ago :)

    Only good news today, I learned something from you, I can dump asm from hotspot, we have Fieldable "solved", ... great, I can go to sleep now :)


    ----- Original Message ----
    From: Yonik Seeley (JIRA) <jira@apache.org>
    To: java-dev@lucene.apache.org
    Sent: Wednesday, 30 July, 2008 11:25:31 PM
    Subject: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery


    [
    https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618513#action_12618513
    ]

    Yonik Seeley commented on LUCENE-1345:
    --------------------------------------

    Eks, I just tried your first TestIteratorPerf.java myself, and comparison with
    zero was faster (as expected) by about 8%
    I commented everything out except for testNew for simplicity.

    Original testNew:
    {code}
    $ java -server -cp . TestIteratorPerf
    new milliseconds=2883
    new milliseconds=3289
    new milliseconds=3148
    new milliseconds=3195
    new milliseconds=3149
    new milliseconds=3179
    new milliseconds=3180
    new milliseconds=3164
    new milliseconds=3179
    new milliseconds=3164
    new total milliseconds=31530
    {code}

    Modified testNew:
    // while(-1!=(doc=it.next())){
    while((doc=it.next()) >= 0)

    {code}
    $ java -server -cp . TestIteratorPerf
    new milliseconds=2806
    new milliseconds=2899
    new milliseconds=2915
    new milliseconds=2899
    new milliseconds=2914
    new milliseconds=2899
    new milliseconds=2899
    new milliseconds=3040
    new milliseconds=2899
    new milliseconds=2930
    new total milliseconds=29100
    {code}

    System: WinXP, Pentium4, java version "1.5.0_11"
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.java, DisjunctionDISI.patch,
    DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch,
    OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    __________________________________________________________
    Not happy with your email address?.
    Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • John Wang (JIRA) at Jan 10, 2009 at 6:26 am
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662627#action_12662627 ]

    John Wang commented on LUCENE-1345:
    -----------------------------------

    Added perf comparisons with boolean set iterators with current scorers

    See patch

    System: Ubunto,
    java version "1.6.0_11"
    Intel core2 Duo 2.44ghz

    new milliseconds=470
    new milliseconds=534
    new milliseconds=450
    new milliseconds=443
    new milliseconds=444
    new milliseconds=445
    new milliseconds=449
    new milliseconds=441
    new milliseconds=444
    new milliseconds=445
    new total milliseconds=4565
    old milliseconds=529
    old milliseconds=491
    old milliseconds=428
    old milliseconds=549
    old milliseconds=427
    old milliseconds=424
    old milliseconds=420
    old milliseconds=424
    old milliseconds=423
    old milliseconds=422
    old total milliseconds=4537

    New/Old Time 4565/4537 (100.61715%)
    OrDocIdSetIterator milliseconds=1138
    OrDocIdSetIterator milliseconds=1106
    OrDocIdSetIterator milliseconds=1065
    OrDocIdSetIterator milliseconds=1066
    OrDocIdSetIterator milliseconds=1065
    OrDocIdSetIterator milliseconds=1067
    OrDocIdSetIterator milliseconds=1072
    OrDocIdSetIterator milliseconds=1118
    OrDocIdSetIterator milliseconds=1065
    OrDocIdSetIterator milliseconds=1069
    OrDocIdSetIterator total milliseconds=10831
    DisjunctionMaxScorer milliseconds=1914
    DisjunctionMaxScorer milliseconds=1981
    DisjunctionMaxScorer milliseconds=1861
    DisjunctionMaxScorer milliseconds=1893
    DisjunctionMaxScorer milliseconds=1886
    DisjunctionMaxScorer milliseconds=1885
    DisjunctionMaxScorer milliseconds=1887
    DisjunctionMaxScorer milliseconds=1889
    DisjunctionMaxScorer milliseconds=1891
    DisjunctionMaxScorer milliseconds=1888
    DisjunctionMaxScorer total milliseconds=18975
    Or/DisjunctionMax Time 10831/18975 (57.080368%)
    OrDocIdSetIterator milliseconds=1079
    OrDocIdSetIterator milliseconds=1075
    OrDocIdSetIterator milliseconds=1076
    OrDocIdSetIterator milliseconds=1093
    OrDocIdSetIterator milliseconds=1077
    OrDocIdSetIterator milliseconds=1074
    OrDocIdSetIterator milliseconds=1078
    OrDocIdSetIterator milliseconds=1075
    OrDocIdSetIterator milliseconds=1074
    OrDocIdSetIterator milliseconds=1074
    OrDocIdSetIterator total milliseconds=10775
    DisjunctionSumScorer milliseconds=1398
    DisjunctionSumScorer milliseconds=1322
    DisjunctionSumScorer milliseconds=1320
    DisjunctionSumScorer milliseconds=1305
    DisjunctionSumScorer milliseconds=1304
    DisjunctionSumScorer milliseconds=1301
    DisjunctionSumScorer milliseconds=1304
    DisjunctionSumScorer milliseconds=1300
    DisjunctionSumScorer milliseconds=1301
    DisjunctionSumScorer milliseconds=1317
    DisjunctionSumScorer total milliseconds=13172
    Or/DisjunctionSum Time 10775/13172 (81.80231%)
    AndDocIdSetIterator milliseconds=330
    AndDocIdSetIterator milliseconds=336
    AndDocIdSetIterator milliseconds=298
    AndDocIdSetIterator milliseconds=299
    AndDocIdSetIterator milliseconds=310
    AndDocIdSetIterator milliseconds=298
    AndDocIdSetIterator milliseconds=298
    AndDocIdSetIterator milliseconds=334
    AndDocIdSetIterator milliseconds=298
    AndDocIdSetIterator milliseconds=299
    AndDocIdSetIterator total milliseconds=3100
    ConjunctionScorer milliseconds=332
    ConjunctionScorer milliseconds=307
    ConjunctionScorer milliseconds=302
    ConjunctionScorer milliseconds=350
    ConjunctionScorer milliseconds=300
    ConjunctionScorer milliseconds=304
    ConjunctionScorer milliseconds=305
    ConjunctionScorer milliseconds=303
    ConjunctionScorer milliseconds=303
    ConjunctionScorer milliseconds=299
    ConjunctionScorer total milliseconds=3105
    And/Conjunction Time 3100/3105 (99.83897%)

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • John Wang (JIRA) at Jan 10, 2009 at 6:28 am
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    John Wang updated LUCENE-1345:
    ------------------------------

    Attachment: booleansetperf.txt

    Added And/Or/Not DocidSet/Iterators

    code ported over from Kamikaze:
    http://code.google.com/p/lucene-ext/

    Perf test updated.

    main contributors to the patch: Anmol Bhasin & Yasuhiro Matsuda

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • John Wang (JIRA) at Jan 10, 2009 at 6:32 am
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662632#action_12662632 ]

    John Wang commented on LUCENE-1345:
    -----------------------------------

    Given the perf number improvements we see, can we consider up the priority?
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jan 10, 2009 at 10:48 am
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1345:
    ---------------------------------------

    Fix Version/s: 2.9
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jan 10, 2009 at 10:50 am
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662650#action_12662650 ]

    Michael McCandless commented on LUCENE-1345:
    --------------------------------------------

    {quote}
    Given the perf number improvements we see, can we consider up the priority?
    {quote}
    I agree, the results look compelling; I marked this and LUCENE-1145 as fix version 2.9.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Marvin Humphrey (JIRA) at Jan 10, 2009 at 12:40 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662656#action_12662656 ]

    Marvin Humphrey commented on LUCENE-1345:
    -----------------------------------------

    Paul Elschot, a while back:
    It would also allow to get rid of Filter in most of the search api,
    as any Filter can just be added to a BooleanQuery.
    In KS svn trunk (and potentially in Lucy), there is no "Filter"; all classes
    that perform filtering are just subclasses of Query which you're expected to
    apply using an ANDQuery. Can you think of any downside to that model? (Would
    it be possible to retrofit Lucene to use it in 3.0?) The motivation was the
    same as the one you articulate: to simplify the search API.

    (Hmm...Thinking out loud: DeletionsFilter as a subclass of Query...)
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jan 10, 2009 at 1:24 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662657#action_12662657 ]

    Michael McCandless commented on LUCENE-1345:
    --------------------------------------------

    {quote}
    (Hmm...Thinking out loud: DeletionsFilter as a subclass of Query...)
    {quote}
    +1
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 10, 2009 at 5:40 pm
    There are many expensive to evaluate queries. If you move the
    deletions check to a clause, then these will be evaluated on deleted
    documents.

    For example, we have a query that inspects stored fields, augmented
    first by an indexed term - basically specialized phrase matching, so
    a user can search for "engels r", and we don't need to evaluate all
    terms that begin with r.

    Evaluating these on deleted documents is a waste of resources.
    On Jan 10, 2009, at 7:23 AM, Michael McCandless (JIRA) wrote:


    [ https://issues.apache.org/jira/browse/LUCENE-1345?
    page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
    tabpanel&focusedCommentId=12662657#action_12662657 ]

    Michael McCandless commented on LUCENE-1345:
    --------------------------------------------

    {quote}
    (Hmm...Thinking out loud: DeletionsFilter as a subclass of Query...)
    {quote}
    +1
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/
    LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java,
    DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch,
    LUCENE-1345.patch, OpenBitSetIteratorExperiment.java,
    TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • John Wang (JIRA) at Jan 10, 2009 at 5:08 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662673#action_12662673 ]

    John Wang commented on LUCENE-1345:
    -----------------------------------

    Filters by definition (afaik) does not participate in scoring. Since "score gathering" is done at the BooleanQuery level, does it mean BooleanQuery would need to do instanceof check to see if it is a Filter?

    Or do we always hardcode filter with score 0? This is also dangerous if people do augment scores at hitcollector level or score gathering logic changes to something not as straightforward as summing.

    my two cents.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jan 10, 2009 at 8:04 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662685#action_12662685 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    Marvin,

    bq. In KS svn trunk (and potentially in Lucy), there is no "Filter"; all classes that perform filtering are just subclasses of Query which you're expected to apply using an ANDQuery. Can you think of any downside to that model?

    In Lucene the class model is that Scorer extends DocIdSetIterator by some
    methods involved with document score values. To prepare searching in
    Lucene the following 'transformations' are done:
    Query -> Weight -> Scorer
    and
    Filter -> DocIdSetIterator

    I've never seen the KS classes, but on the face of it, the downside of using
    ANDQuery (KS) for filtering is that it has to provide a score value, which
    somehow must be ignored during search.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jan 10, 2009 at 8:14 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662687#action_12662687 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    John, Michael,

    bq. Given the perf number improvements we see, can we consider up the priority?

    I think most of the performance improvements that John posted can be moved into
    trunk without the addition of Filter as a clause to BooleanQuery, so I'd rather let
    these go first.

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jan 10, 2009 at 8:16 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662687#action_12662687 ]

    paul.elschot@xs4all.nl edited comment on LUCENE-1345 at 1/10/09 12:14 PM:
    ----------------------------------------------------------------

    John, Michael,

    bq. Given the perf number improvements we see, can we consider up the priority?

    I think most of the performance improvements that John posted can be moved into
    trunk without the addition of Filter as a clause to BooleanQuery. Therefore I'd rather let
    these go in before adding Filter as clause to BooleanQuery.


    was (Author: paul.elschot@xs4all.nl):
    John, Michael,

    bq. Given the perf number improvements we see, can we consider up the priority?

    I think most of the performance improvements that John posted can be moved into
    trunk without the addition of Filter as a clause to BooleanQuery, so I'd rather let
    these go first.

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jan 11, 2009 at 12:12 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662766#action_12662766 ]

    Uwe Schindler commented on LUCENE-1345:
    ---------------------------------------

    Here is a nice idea, how to merge Filters and Queries:

    Why not just combine ConstantScoreQuery and the current abstract Filter APIs to a new Filter class. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery and it would even be backwards compatible. If somebody implements the filters getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query.

    For some performance improvements when combining more than one filter in a BooleanQuery (e.g. anding/oring the iterators, filtering,...) the code of BooleanQuery could use instanceof.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jan 11, 2009 at 1:12 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1345:
    ----------------------------------

    Attachment: LUCENE-1345-Filter+Query-merge.patch

    Attached is a patch, that merges ConstantScoreQuery and Filter APIs. The ConstantScoreQuery is deprecated by this. Further cleanups then may remove the use of ConstantScoreQuery from other places in Lucene, where the class is currently used to wrap filters as Queries.

    All important tests pass! Only one test does not pass: a problem occurs in TestSimpleExplanations, but this may be because the changed toString()/toString(field) and query explanations (because ConstantScoreQuery no longer returns an explanation), this may be cha nged or the test should be rewritten. Nevertheless, I do not understand the whole test case :-)

    Uwe
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jan 11, 2009 at 2:48 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662782#action_12662782 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    Uwe,

    The point here is to let BooleanQuery also take care of the filtering logic without doing any extra score computations.

    For example that involves changing ConjunctionScorer to not only accept Scorers, but also DocIdSetIterators,
    and use these DocIdSetIterators together with the Scorers to skip to the next matching document, but only use the Scorers to compute the score value.

    What is the point of adding a score value to Filters, when that score value has to be ignored during query search?
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jan 11, 2009 at 2:54 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662783#action_12662783 ]

    Uwe Schindler commented on LUCENE-1345:
    ---------------------------------------

    The idea behind the patch was to merge the code of filters and queries. Further optimizations now can remove the score calculation from the filter code.

    Using my patch you are now be able to add filters to BooleanQueries or directly execute them using Searcher.search, because they are subclasses of Query. Further optimizations now may remove the score computation in complete, if the given query extends Filter (if (query instanceof Filter) do something other).
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jan 11, 2009 at 3:30 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662786#action_12662786 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    bq. Further optimizations now may remove the score computation in complete, if the given query extends Filter (if (query instanceof Filter) do something other)

    Such further optimization is precisily the idea of the original patch here, but without making Filter a subclass of Query.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jan 11, 2009 at 3:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662790#action_12662790 ]

    Uwe Schindler commented on LUCENE-1345:
    ---------------------------------------

    I know this. My idea was just to remove the burden of thinking about Filters and Queries for the developer of Lucene applications.

    In my opinion, the terms "Query" and "Filter" should be merged. Logic behind BooleanQuery or Searcher should simply think about the *best" logic how to optimize what the user wants to do.

    Maybe I should create an new JIRA issue out of my suggestion to merge Filters and Queries? In my opinion, this is something nice to have in 3.0.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jan 11, 2009 at 3:58 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662795#action_12662795 ]

    Paul Elschot commented on LUCENE-1345:
    --------------------------------------

    bq. In my opinion, the terms "Query" and "Filter" should be merged.

    There is clear distinction between the two terms.
    QueryWrapperFilter changes a Query into a Filter and ConstantScoreQuery changes a Filter into a Query.
    The first one removes the scoring by upcasting a Scorer to a DocIdSetIterator, and the second one adds a constant score to a DocIdSetIterator to create a Scorer.

    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jan 11, 2009 at 4:28 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662796#action_12662796 ]

    Uwe Schindler commented on LUCENE-1345:
    ---------------------------------------

    {quote}
    bq. In my opinion, the terms "Query" and "Filter" should be merged.

    There is clear distinction between the two terms.
    QueryWrapperFilter changes a Query into a Filter and ConstantScoreQuery changes a Filter into a Query.
    The first one removes the scoring by upcasting a Scorer to a DocIdSetIterator, and the second one adds a constant score to a DocIdSetIterator to create a Scorer.
    {quote}
    You are right, but for a Lucene user there is always the problem of the distiction between both terms. When combining both, the user would get less burden on thinking about both. It would make life easier, and would hide some work for the user. The problem are the fine differences between the both, but for the general user who does not have such large indexes where the difference between both counts, it would makte things easier.

    How about merging Filters and Queries and then thinking about optimizations in the code of BooleanQuery to identify use cases where the scoring can be removed and where a constant score is needed. There are two cases where the two different types make problems:

    - user (A) wants to use my contrib TrieRangeQuery/-Filter and just execute a Query that returns documents that match the Range. The problem for this user is: How to implement this? User a MatchAllDocsQuery and filter the results with TrieRangeFilter or use ConstantScoreQuery to combine both? What is faster?
    - user (B) wants to filter some documents using a normal Filter. If he uses the standard Query+Filter combination of Searcher.search() he must before distinguish what part of the combinations should be the filter and what should be the query. Maybe he got a TrieRangeQuery (the query one using a ConstantScore on the Filter) as query and want it combine with another query. With the new code that detects the type of both clauses, BooleanQuery code could choose to execute the TermQueries as normal scorer query and filter the results using the given Filter as clause.

    Both tasks could be easily combined if Query and Filter would be the same. The user (A) would not need to create a constant score query on the Trie filter, he could just use it with Searcher.search() as a "Query". If he want to add some normal term queries from a query parser to it, he would use a BooleanQuery to combine both. The BooleanQuery code would then find out that one of the clauses is a Filter and would *not* use ConstantScore code to filter the result and just use the normal filter code. For the user it is simplier: He would always create a TrieRangeQueryFilter combination and would let BooleanQuery choose what query execution strategy to use.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jan 11, 2009 at 4:30 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662797#action_12662797 ]

    Uwe Schindler commented on LUCENE-1345:
    ---------------------------------------

    An additional case: User (A) uses a BooleanQuery and just adds the Filter to it and nothing more (no TermQueries and so on). In this case, ConstantScore algorithm must be used! But for the end user the API is always identical.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jan 11, 2009 at 4:46 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662799#action_12662799 ]

    Uwe Schindler commented on LUCENE-1345:
    ---------------------------------------

    Here some ideas how to implement search() with Query and Filter:
    - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents.
    - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before
    - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters.

    I hope this explains how I would implement the combined Filters and Queries.
    Allow Filter as clause to BooleanQuery
    --------------------------------------

    Key: LUCENE-1345
    URL: https://issues.apache.org/jira/browse/LUCENE-1345
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Reporter: Paul Elschot
    Priority: Minor
    Fix For: 2.9

    Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java

    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJul 23, '08 at 9:42p
activeJun 11, '09 at 3:11p
posts68
users7
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase