FAQ
Hi

I remember a while ago a discussion around the efficiency of TermDocs.seek
and how it is inefficient and it's better to call IndexReader.termDocs
instead (actually someone was proposing to remove seek entirely from the
interface because of that). I've looked at FieldCacheImpl's
ByteCache.createValue and noticed it calls termDocs.seek.

So is it 'safe' to call seek again? Has the implementation improved? I
checked SegmentTermDocs change history but didn't see anything related, nor
in FieldCacheImpl. I'm iterating a TermEnum and need to get the documents
associated with each term. Basically, more or so what FieldCacheImpl does.
So I thought to use the same methodology (I used to call reader.termDocs on
every term before I saw FieldCacheImpl's implementation). Since TermEnum
moves forward, I hope that termDocs.seek will move forward as well, and I
only do it within the same field.

BTW, if there is a better way to do what I'm trying to (such as a better
API), I'd appreciate if you can give me a hint.

Thanks,
Shai

Search Discussions

  • Michael McCandless at Jan 17, 2010 at 10:25 am

    On Sun, Jan 17, 2010 at 5:01 AM, Shai Erera wrote:

    I remember a while ago a discussion around the efficiency of TermDocs.seek
    and how it is inefficient and it's better to call IndexReader.termDocs
    instead (actually someone was proposing to remove seek entirely from the
    interface because of that). I've looked at FieldCacheImpl's
    ByteCache.createValue and noticed it calls termDocs.seek.
    Actually, I think the discussion was about TermEnum.skipTo, which is
    in fact now removed as of 3.0, not TermDocs.seek. I think
    TermDocs.seek is OK to call.
    So is it 'safe' to call seek again? Has the implementation improved? I
    checked SegmentTermDocs change history but didn't see anything related, nor
    in FieldCacheImpl. I'm iterating a TermEnum and need to get the documents
    associated with each term. Basically, more or so what FieldCacheImpl does.
    So I thought to use the same methodology (I used to call reader.termDocs on
    every term before I saw FieldCacheImpl's implementation). Since TermEnum
    moves forward, I hope that termDocs.seek will move forward as well, and I
    only do it within the same field.
    I think TermDocs.seek has no forward only "constraint", meaning,
    whatever term you give it (whether it's before or after where it
    currently is), it will go to.
    BTW, if there is a better way to do what I'm trying to (such as a better
    API), I'd appreciate if you can give me a hint.
    Just to give a preview of the current flex API... you'd do it roughly
    like this (this is what FieldCacheImpl on flex branch does):

    // represents all terms in the field
    Terms terms = reader.fields().terms(field);

    // assuming you want to skip the deleted docs...
    Bits skipDocs = reader.getDeletedDocs();

    if (terms != null) {
    // field exists
    TermsEnum termsEnum = terms.iterator();
    while(true) {
    final BytesRef term = termsEnum.next();
    if (term == null) {
    break;
    }
    DocsEnum docs = termsEnum.docs(skipDocs);
    while(true) {
    final int docID = docs.nextDoc();
    if (docID == DocsEnum.NO_MORE_DOCS) {
    break;
    }
    // do something with docID
    }
    }
    }

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Shai Erera at Jan 17, 2010 at 10:35 am
    Oh right, I confused TermEnum.skipTo w/ TermDocs.seek. Thanks for reminding
    me that.

    BTW, the flex implementation looks really useful. I like it that I won't
    need to compare the field anymore. Looking forward to it.

    Thanks
    Shai
    On Sun, Jan 17, 2010 at 12:24 PM, Michael McCandless wrote:
    On Sun, Jan 17, 2010 at 5:01 AM, Shai Erera wrote:

    I remember a while ago a discussion around the efficiency of
    TermDocs.seek
    and how it is inefficient and it's better to call IndexReader.termDocs
    instead (actually someone was proposing to remove seek entirely from the
    interface because of that). I've looked at FieldCacheImpl's
    ByteCache.createValue and noticed it calls termDocs.seek.
    Actually, I think the discussion was about TermEnum.skipTo, which is
    in fact now removed as of 3.0, not TermDocs.seek. I think
    TermDocs.seek is OK to call.
    So is it 'safe' to call seek again? Has the implementation improved? I
    checked SegmentTermDocs change history but didn't see anything related, nor
    in FieldCacheImpl. I'm iterating a TermEnum and need to get the documents
    associated with each term. Basically, more or so what FieldCacheImpl does.
    So I thought to use the same methodology (I used to call reader.termDocs on
    every term before I saw FieldCacheImpl's implementation). Since TermEnum
    moves forward, I hope that termDocs.seek will move forward as well, and I
    only do it within the same field.
    I think TermDocs.seek has no forward only "constraint", meaning,
    whatever term you give it (whether it's before or after where it
    currently is), it will go to.
    BTW, if there is a better way to do what I'm trying to (such as a better
    API), I'd appreciate if you can give me a hint.
    Just to give a preview of the current flex API... you'd do it roughly
    like this (this is what FieldCacheImpl on flex branch does):

    // represents all terms in the field
    Terms terms = reader.fields().terms(field);

    // assuming you want to skip the deleted docs...
    Bits skipDocs = reader.getDeletedDocs();

    if (terms != null) {
    // field exists
    TermsEnum termsEnum = terms.iterator();
    while(true) {
    final BytesRef term = termsEnum.next();
    if (term == null) {
    break;
    }
    DocsEnum docs = termsEnum.docs(skipDocs);
    while(true) {
    final int docID = docs.nextDoc();
    if (docID == DocsEnum.NO_MORE_DOCS) {
    break;
    }
    // do something with docID
    }
    }
    }

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 17, '10 at 10:02a
activeJan 17, '10 at 10:35a
posts3
users2
websitelucene.apache.org

2 users in discussion

Shai Erera: 2 posts Michael McCandless: 1 post

People

Translate

site design / logo © 2022 Grokbase