FAQ
We are considering replacing the current random-access
IndexReader.isDeleted(int docID) method with an iterator & skipTo
(DocIdSet) access that would let you iterate through the deleted
docIDs, instead.

At the same time we would move to a new API to replace
IndexReader.document(int docID) that would no longer check whether the
document is deleted.

This is being discussed now under several Jira issues and on
java-dev.

Would this be a problem for any Lucene applications out there?

How is isDeleted used today (outside of Lucene)? Normally an
IndexSearcher would never return a deleted document, and so "in
theory" a deleted docID should never "escape" Lucene's APIs.

So I'm curious what applications in fact rely on isDeleted, and how
that method is being used...

Thanks,

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • John Wang at Jan 25, 2009 at 12:04 am
    Mike:
    "We are considering replacing the current random-access
    IndexReader.isDeleted(int docID) method with an iterator & skipTo
    (DocIdSet) access that would let you iterate through the deleted
    docIDs, instead."

    This is exactly what we are doing. We do have to however, build the
    internal DocIdSet from isDeleted call. It would be great if this is provided
    thru the api.

    I am also assuming MatchAllDocsQuery is fixed to avoid isDeleted call?

    -John
    On Fri, Jan 23, 2009 at 12:25 PM, Michael McCandless wrote:

    We are considering replacing the current random-access
    IndexReader.isDeleted(int docID) method with an iterator & skipTo
    (DocIdSet) access that would let you iterate through the deleted
    docIDs, instead.

    At the same time we would move to a new API to replace
    IndexReader.document(int docID) that would no longer check whether the
    document is deleted.

    This is being discussed now under several Jira issues and on
    java-dev.

    Would this be a problem for any Lucene applications out there?

    How is isDeleted used today (outside of Lucene)? Normally an
    IndexSearcher would never return a deleted document, and so "in
    theory" a deleted docID should never "escape" Lucene's APIs.

    So I'm curious what applications in fact rely on isDeleted, and how
    that method is being used...

    Thanks,

    Mike


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jan 25, 2009 at 12:27 pm
    OK, interesting, thanks. What do you use the deletedDocs iterator for?

    Yes, MatchAllDocsQuery should soon be fixed to not use the
    synchronized IndexReader.isDeleted method internally:

    https://issues.apache.org/jira/browse/LUCENE-1316

    Mike

    John Wang wrote:
    Mike:
    "We are considering replacing the current random-access
    IndexReader.isDeleted(int docID) method with an iterator & skipTo
    (DocIdSet) access that would let you iterate through the deleted
    docIDs, instead."

    This is exactly what we are doing. We do have to however, build
    the
    internal DocIdSet from isDeleted call. It would be great if this is
    provided
    thru the api.

    I am also assuming MatchAllDocsQuery is fixed to avoid
    isDeleted call?

    -John

    On Fri, Jan 23, 2009 at 12:25 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    We are considering replacing the current random-access
    IndexReader.isDeleted(int docID) method with an iterator & skipTo
    (DocIdSet) access that would let you iterate through the deleted
    docIDs, instead.

    At the same time we would move to a new API to replace
    IndexReader.document(int docID) that would no longer check whether
    the
    document is deleted.

    This is being discussed now under several Jira issues and on
    java-dev.

    Would this be a problem for any Lucene applications out there?

    How is isDeleted used today (outside of Lucene)? Normally an
    IndexSearcher would never return a deleted document, and so "in
    theory" a deleted docID should never "escape" Lucene's APIs.

    So I'm curious what applications in fact rely on isDeleted, and how
    that method is being used...

    Thanks,

    Mike


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at Jan 26, 2009 at 10:56 am
    Hi Mike


    I've got some applications that use lucene purely as a place to store
    data, with no searching other than by product id, and have programs
    that get all the data out of the store by code like

    for (int i = 0; i < max; i++) {
    if (!reader.isDeleted(i)) {
    Document doc = reader.document(i);
    ...
    }

    The index has regular updates and occasional optimizes so normally
    does contain deleted docs.

    If the isDeleted() method was removed it would only be a minor
    inconvenience - I'd be happy to code to any new API calls, or change
    the method to call optimize first, or whatever.



    --
    Ian.
    ian.lea@gmail.com


    On Fri, Jan 23, 2009 at 8:25 PM, Michael McCandless
    wrote:
    We are considering replacing the current random-access
    IndexReader.isDeleted(int docID) method with an iterator & skipTo
    (DocIdSet) access that would let you iterate through the deleted
    docIDs, instead.

    At the same time we would move to a new API to replace
    IndexReader.document(int docID) that would no longer check whether the
    document is deleted.

    This is being discussed now under several Jira issues and on
    java-dev.

    Would this be a problem for any Lucene applications out there?

    How is isDeleted used today (outside of Lucene)? Normally an
    IndexSearcher would never return a deleted document, and so "in
    theory" a deleted docID should never "escape" Lucene's APIs.

    So I'm curious what applications in fact rely on isDeleted, and how
    that method is being used...

    Thanks,

    Mike
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jan 26, 2009 at 11:17 am
    OK, interesting. This case looks like it'd be a good fit for
    iteration-API to access deleted docs. (And, a good case for column-
    stride fields, too!).

    Thanks for sharing Ian,

    Mike

    Ian Lea wrote:
    Hi Mike


    I've got some applications that use lucene purely as a place to store
    data, with no searching other than by product id, and have programs
    that get all the data out of the store by code like

    for (int i = 0; i < max; i++) {
    if (!reader.isDeleted(i)) {
    Document doc = reader.document(i);
    ...
    }

    The index has regular updates and occasional optimizes so normally
    does contain deleted docs.

    If the isDeleted() method was removed it would only be a minor
    inconvenience - I'd be happy to code to any new API calls, or change
    the method to call optimize first, or whatever.



    --
    Ian.
    ian.lea@gmail.com


    On Fri, Jan 23, 2009 at 8:25 PM, Michael McCandless
    wrote:
    We are considering replacing the current random-access
    IndexReader.isDeleted(int docID) method with an iterator & skipTo
    (DocIdSet) access that would let you iterate through the deleted
    docIDs, instead.

    At the same time we would move to a new API to replace
    IndexReader.document(int docID) that would no longer check whether
    the
    document is deleted.

    This is being discussed now under several Jira issues and on
    java-dev.

    Would this be a problem for any Lucene applications out there?

    How is isDeleted used today (outside of Lucene)? Normally an
    IndexSearcher would never return a deleted document, and so "in
    theory" a deleted docID should never "escape" Lucene's APIs.

    So I'm curious what applications in fact rely on isDeleted, and how
    that method is being used...

    Thanks,

    Mike
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Jan 26, 2009 at 11:39 am
    The same here:
    I often have code that needs to get *all* documents out of an IndexReader
    (excluding deleted docs). Currently this is coded like the example from Ian.

    This is often code that checks content of indexes or iterates over all
    documents without any Query.

    A better API may be good.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Ian Lea
    Sent: Monday, January 26, 2009 11:55 AM
    To: java-user@lucene.apache.org
    Subject: Re: IndexReader.isDeleted

    Hi Mike


    I've got some applications that use lucene purely as a place to store
    data, with no searching other than by product id, and have programs
    that get all the data out of the store by code like

    for (int i = 0; i < max; i++) {
    if (!reader.isDeleted(i)) {
    Document doc = reader.document(i);
    ...
    }

    The index has regular updates and occasional optimizes so normally
    does contain deleted docs.

    If the isDeleted() method was removed it would only be a minor
    inconvenience - I'd be happy to code to any new API calls, or change
    the method to call optimize first, or whatever.



    --
    Ian.
    ian.lea@gmail.com


    On Fri, Jan 23, 2009 at 8:25 PM, Michael McCandless
    wrote:
    We are considering replacing the current random-access
    IndexReader.isDeleted(int docID) method with an iterator & skipTo
    (DocIdSet) access that would let you iterate through the deleted
    docIDs, instead.

    At the same time we would move to a new API to replace
    IndexReader.document(int docID) that would no longer check whether the
    document is deleted.

    This is being discussed now under several Jira issues and on
    java-dev.

    Would this be a problem for any Lucene applications out there?

    How is isDeleted used today (outside of Lucene)? Normally an
    IndexSearcher would never return a deleted document, and so "in
    theory" a deleted docID should never "escape" Lucene's APIs.

    So I'm curious what applications in fact rely on isDeleted, and how
    that method is being used...

    Thanks,

    Mike
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Toke Eskildsen at Jan 26, 2009 at 12:02 pm

    On Mon, 2009-01-26 at 11:55 +0100, Ian Lea wrote:
    for (int i = 0; i < max; i++) {
    if (!reader.isDeleted(i)) {
    Document doc = reader.document(i);
    ...
    }
    Hey! You've stolen our code! :-)

    While we don't use Lucene in the same way as you, we also perform
    iterations over all documents. An iterative approach to deleted
    documents, instead of the current random access, would be fine by us.
    It'll just take a minor refactoring to get it to work in our end.

    - Toke Eskildsen, http://statsbiblioteket.dk


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 23, '09 at 8:26p
activeJan 26, '09 at 12:02p
posts7
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase