FAQ
I see where I was confused - I was looking at the javadoc for TermsEnum,
which says you must call next or seek before getting the term - but I
was more confused than straightened out when I saw FilteredTermsEnum
said the enum must already be positioned at the first term - it didn't
dawn on me that they acted different in that regard - thats why I put
the nocommit - was a bit confused. Looking at the code, it makes sense
now though.

I do agree that its a bit confusing now that you have to call .empty
first now. Thats what led me down the path of trying .next != null -
didn't even occur to me there was a .empty. Knew something was screwy
though, even with the test passing.
--- lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNumericRangeQuery32.java (original)
+++ lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNumericRangeQuery32.java Wed Dec 2 23:27:49 2009
@@ -443,9 +443,8 @@
NumericRangeQuery<Integer> q = NumericRangeQuery.newIntRange("field4", 4,
lower, upper, true, true);
FilteredTermsEnum termEnum = q.getTermsEnum(searcher.getIndexReader());
- //nocommit: double check this merge 'fix'
int count = 0;
- if (termEnum.next() != null) {
+ if (!termEnum.empty()) {
do {
final TermRef t = termEnum.term();
if (t != null) {
@@ -457,7 +456,7 @@
break;
} while (termEnum.next() != null);
}
- assertFalse(termEnum.next() != null);
+ assertNotNull(termEnum.next());
System.out.println("TermEnum on 'field4' for range [" + lower + "," + upper
+ "] contained " + count + " terms.");

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Uwe Schindler at Dec 3, 2009 at 9:00 am
    I already wrote an comment in the flex issue about that. Its not really
    good. Maybe we should change the FilteredTermsEnum also to require calling
    next() first, which makes it more useable like an iterator. I was always not
    so happy with the fact, that you need to have the strange do...while loop.

    For NRQ, the change is simple, just remove the next() call in the ctor. For
    other FilteredTermsEnums its similar, just move the seek code to the first
    next call.

    In my opinion, code should look like that:

    TermsEnum te = MTQ.getTermsEnum();
    TermRef term;
    while ((term = te.next()) != null) {
    ..
    }

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Mark Miller
    Sent: Thursday, December 03, 2009 12:56 AM
    To: java-dev@lucene.apache.org
    Subject: Re: svn commit: r886339 -
    /lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNume
    ricRangeQuery32.java

    I see where I was confused - I was looking at the javadoc for TermsEnum,
    which says you must call next or seek before getting the term - but I
    was more confused than straightened out when I saw FilteredTermsEnum
    said the enum must already be positioned at the first term - it didn't
    dawn on me that they acted different in that regard - thats why I put
    the nocommit - was a bit confused. Looking at the code, it makes sense
    now though.

    I do agree that its a bit confusing now that you have to call .empty
    first now. Thats what led me down the path of trying .next != null -
    didn't even occur to me there was a .empty. Knew something was screwy
    though, even with the test passing.
    ---
    lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNumer
    icRangeQuery32.java (original)
    +++
    lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNumer
    icRangeQuery32.java Wed Dec 2 23:27:49 2009
    @@ -443,9 +443,8 @@
    NumericRangeQuery<Integer> q =
    NumericRangeQuery.newIntRange("field4", 4,
    lower, upper, true, true);
    FilteredTermsEnum termEnum =
    q.getTermsEnum(searcher.getIndexReader());
    - //nocommit: double check this merge 'fix'
    int count = 0;
    - if (termEnum.next() != null) {
    + if (!termEnum.empty()) {
    do {
    final TermRef t = termEnum.term();
    if (t != null) {
    @@ -457,7 +456,7 @@
    break;
    } while (termEnum.next() != null);
    }
    - assertFalse(termEnum.next() != null);
    + assertNotNull(termEnum.next());
    System.out.println("TermEnum on 'field4' for range [" + lower + "," + upper
    + "] contained " + count + " terms.");

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler at Dec 3, 2009 at 9:06 am
    I forgot: As we already have a new API, we can change this behaviour for
    TermsEnums easily and it seems for the SegmentTermsEnums its already done. A
    lot of people on java-user always had the same problem, that you need to use
    a do-while-loop which is not intuitive.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Uwe Schindler
    Sent: Thursday, December 03, 2009 10:00 AM
    To: java-dev@lucene.apache.org
    Subject: RE: svn commit: r886339 -
    /lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNume
    ricRangeQuery32.java

    I already wrote an comment in the flex issue about that. Its not really
    good. Maybe we should change the FilteredTermsEnum also to require calling
    next() first, which makes it more useable like an iterator. I was always
    not
    so happy with the fact, that you need to have the strange do...while loop.

    For NRQ, the change is simple, just remove the next() call in the ctor.
    For
    other FilteredTermsEnums its similar, just move the seek code to the first
    next call.

    In my opinion, code should look like that:

    TermsEnum te = MTQ.getTermsEnum();
    TermRef term;
    while ((term = te.next()) != null) {
    ..
    }

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Mark Miller
    Sent: Thursday, December 03, 2009 12:56 AM
    To: java-dev@lucene.apache.org
    Subject: Re: svn commit: r886339 -
    /lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNume
    ricRangeQuery32.java

    I see where I was confused - I was looking at the javadoc for TermsEnum,
    which says you must call next or seek before getting the term - but I
    was more confused than straightened out when I saw FilteredTermsEnum
    said the enum must already be positioned at the first term - it didn't
    dawn on me that they acted different in that regard - thats why I put
    the nocommit - was a bit confused. Looking at the code, it makes sense
    now though.

    I do agree that its a bit confusing now that you have to call .empty
    first now. Thats what led me down the path of trying .next != null -
    didn't even occur to me there was a .empty. Knew something was screwy
    though, even with the test passing.
    ---
    lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNumer
    icRangeQuery32.java (original)
    +++
    lucene/java/branches/flex_1458/src/test/org/apache/lucene/search/TestNumer
    icRangeQuery32.java Wed Dec 2 23:27:49 2009
    @@ -443,9 +443,8 @@
    NumericRangeQuery<Integer> q =
    NumericRangeQuery.newIntRange("field4", 4,
    lower, upper, true, true);
    FilteredTermsEnum termEnum =
    q.getTermsEnum(searcher.getIndexReader());
    - //nocommit: double check this merge 'fix'
    int count = 0;
    - if (termEnum.next() != null) {
    + if (!termEnum.empty()) {
    do {
    final TermRef t = termEnum.term();
    if (t != null) {
    @@ -457,7 +456,7 @@
    break;
    } while (termEnum.next() != null);
    }
    - assertFalse(termEnum.next() != null);
    + assertNotNull(termEnum.next());
    System.out.println("TermEnum on 'field4' for range [" + lower +
    ","
    + upper
    + "] contained " + count + " terms.");

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at Dec 3, 2009 at 10:05 am

    On Thu, Dec 3, 2009 at 3:59 AM, Uwe Schindler wrote:
    I already wrote an comment in the flex issue about that. Its not really
    good. Maybe we should change the FilteredTermsEnum also to require calling
    next() first, which makes it more useable like an iterator. +1
    I was always not
    so happy with the fact, that you need to have the strange do...while loop.

    For NRQ, the change is simple, just remove the next() call in the ctor. For
    other FilteredTermsEnums its similar, just move the seek code to the first
    next call.

    In my opinion, code should look like that:

    TermsEnum te = MTQ.getTermsEnum();
    TermRef term;
    while ((term = te.next()) != null) {
    ..
    }
    Let's remove the .empty(), and let's switch to this "normal iterator"
    semantics -- I think I had originally added that back when TermsEnum
    didn't have a .term() method, ie, you had to keep the result returned
    from calling .next() or .seek(). But that proved too much of hassle
    (lots of places where we pass a TermsEnum down and the callee needs to
    figure out what term it's on), so I've since added .term().

    Actually TermEnum (the current API) already behaves like a normal
    iterator, right? It's only FilteredTermEnum (and, FilteredTermsEnum)
    that pre-next's itself on construction. So we should fix
    FilteredTermsEnum to not pre-next, and to remove its empty() method?

    Uwe do you want to take a stab at it? Or, I can... let me know.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler at Dec 3, 2009 at 10:31 am
    Hi Mike,
    Let's remove the .empty(), and let's switch to this "normal iterator"
    semantics -- I think I had originally added that back when TermsEnum
    didn't have a .term() method, ie, you had to keep the result returned
    from calling .next() or .seek(). But that proved too much of hassle
    (lots of places where we pass a TermsEnum down and the callee needs to
    figure out what term it's on), so I've since added .term(). +1
    Actually TermEnum (the current API) already behaves like a normal
    iterator, right? It's only FilteredTermEnum (and, FilteredTermsEnum)
    that pre-next's itself on construction. So we should fix
    FilteredTermsEnum to not pre-next, and to remove its empty() method?
    Not really, if you get the unitialized TermEnum from IR.terms(), it is like
    an iterator, but if you get it with IR.terms(Term), it is already placed on
    the first term. You have to then check, if it the current term is null (e.g.
    exhausted if you take the last term). You see this type of code everywhere
    in lucene, like in MTQ. In my opinion, it would be better here to always
    call next() first (and this was often done by users, then missing the first
    term). The new API does not have this problem, as you only get an terms
    iterator for a specific field that is per default always uninitialized and
    you have to call seek. But with trunk, that is not the case.

    Uwe do you want to take a stab at it? Or, I can... let me know.
    I think, you know the code better, but I could find out what to do.

    I would like to have an IllegalStateException if you call docFreq() or
    term(), too. I think this would help people migrating from old to new code
    (and rewriting their do-while code to while(next)).

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at Dec 3, 2009 at 11:27 am

    On Thu, Dec 3, 2009 at 5:30 AM, Uwe Schindler wrote:

    Actually TermEnum (the current API) already behaves like a normal
    iterator, right?  It's only FilteredTermEnum (and, FilteredTermsEnum)
    that pre-next's itself on construction.  So we should fix
    FilteredTermsEnum to not pre-next, and to remove its empty() method?
    Not really, if you get the unitialized TermEnum from IR.terms(), it is like
    an iterator, but if you get it with IR.terms(Term), it is already placed on
    the first term. Oh I see.
    You have to then check, if it the current term is null (e.g.
    exhausted if you take the last term). You see this type of code everywhere
    in lucene, like in MTQ. In my opinion, it would be better here to always
    call next() first (and this was often done by users, then missing the first
    term).
    Well, terms(Term t) matches "termsEnum.seek(t)" in the flex API,
    which also (depending on the SeekStatus) puts you on the term, if it
    was present.
    The new API does not have this problem, as you only get an terms
    iterator for a specific field that is per default always uninitialized and
    you have to call seek. But with trunk, that is not the case.
    Right, but upon seek'ing, you must handle the SEEK_FOUND case like you
    would today with terms(Term t).
    Uwe do you want to take a stab at it?  Or, I can... let me know.
    I think, you know the code better, but I could find out what to do.
    I was thinking... the more people comfortable w/ the flex branch,
    before we land, the better. It's an immense change, so if more
    committers have a deeper knowledge of it (nudge nudge), that'd ease
    the transition.

    But if you don't want to tackle it now, that's fine, I can take it...
    I would like to have an IllegalStateException if you call docFreq() or
    term(), too. I think this would help people migrating from old to new code
    (and rewriting their do-while code to while(next)).
    Hmm, we'd need to do this for all TermsEnum methods? (eg ord, docs as
    well). This is making me a bit nervous -- these methods are called
    quite a bit -- I'd rather not pick up the small perf hit for
    legitimate usage of the API.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedDec 2, '09 at 11:56p
activeDec 3, '09 at 11:27a
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase