FAQ
I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
get only a subset of the expected results, even accounting for deletes. The
index has 10 segments. In IndexSearcher->searchWithFilter, it looks like the
scorer is advancing to the filter's docId, which is the index-wide value,
but the scorer is using the segment-relative value. If I optimize the index,
I get the expected results.
Does this look like a bug?

Peter

Search Discussions

  • Michael McCandless at Dec 4, 2009 at 3:39 pm
    That doesn't sound good.

    Though, in searchWithFilter, we seem to ask for the Query's scorer,
    and the Filter's docIdSetIterator, using the same reader (which may be
    toplevel, for the legacy case, or per-segment, for the normal case).
    So I'm not [yet] seeing where the issue is...

    Can you boil it down to a smallish test case?

    Mike
    On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan wrote:
    I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
    wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
    get only a subset of the expected results, even accounting for deletes. The
    index has 10 segments. In IndexSearcher->searchWithFilter, it looks like the
    scorer is advancing to the filter's docId, which is the index-wide value,
    but the scorer is using the segment-relative value. If I optimize the index,
    I get the expected results.
    Does this look like a bug?

    Peter
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Peter Keegan at Dec 4, 2009 at 3:48 pm
    I think the Filter's docIdSetIterator is using the top level reader for each
    segment, because the cardinality of the DocIdSet from which it's created is
    the same for all readers (and what I expect to see at the top level.

    Peter
    On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless wrote:

    That doesn't sound good.

    Though, in searchWithFilter, we seem to ask for the Query's scorer,
    and the Filter's docIdSetIterator, using the same reader (which may be
    toplevel, for the legacy case, or per-segment, for the normal case).
    So I'm not [yet] seeing where the issue is...

    Can you boil it down to a smallish test case?

    Mike
    On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan wrote:
    I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
    wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
    get only a subset of the expected results, even accounting for deletes. The
    index has 10 segments. In IndexSearcher->searchWithFilter, it looks like the
    scorer is advancing to the filter's docId, which is the index-wide value,
    but the scorer is using the segment-relative value. If I optimize the index,
    I get the expected results.
    Does this look like a bug?

    Peter
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Simon Willnauer at Dec 4, 2009 at 4:02 pm
    Peter, which filter do you use, do you respect the IndexReaders
    maxDoc() and the docBase?

    simon
    On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan wrote:
    I think the Filter's docIdSetIterator is using the top level reader for each
    segment, because the cardinality of the DocIdSet from which it's created is
    the same for all readers (and what I expect to see at the top level.

    Peter

    On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    That doesn't sound good.

    Though, in searchWithFilter, we seem to ask for the Query's scorer,
    and the Filter's docIdSetIterator, using the same reader (which may be
    toplevel, for the legacy case, or per-segment, for the normal case).
    So I'm not [yet] seeing where the issue is...

    Can you boil it down to a smallish test case?

    Mike

    On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <peterlkeegan@gmail.com>
    wrote:
    I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
    wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
    get only a subset of the expected results, even accounting for deletes. The
    index has 10 segments. In IndexSearcher->searchWithFilter, it looks like the
    scorer is advancing to the filter's docId, which is the index-wide value,
    but the scorer is using the segment-relative value. If I optimize the index,
    I get the expected results.
    Does this look like a bug?

    Peter
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Peter Keegan at Dec 4, 2009 at 4:27 pm
    The filter is just a java.util.BitSet. I use the top level reader to create
    the filter, and call IndexSearcher.search (Query, Filter, HitCollector). So,
    there is no 'docBase' at this level of the api.

    Peter
    On Fri, Dec 4, 2009 at 11:01 AM, Simon Willnauer wrote:

    Peter, which filter do you use, do you respect the IndexReaders
    maxDoc() and the docBase?

    simon
    On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan wrote:
    I think the Filter's docIdSetIterator is using the top level reader for each
    segment, because the cardinality of the DocIdSet from which it's created is
    the same for all readers (and what I expect to see at the top level.

    Peter

    On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    That doesn't sound good.

    Though, in searchWithFilter, we seem to ask for the Query's scorer,
    and the Filter's docIdSetIterator, using the same reader (which may be
    toplevel, for the legacy case, or per-segment, for the normal case).
    So I'm not [yet] seeing where the issue is...

    Can you boil it down to a smallish test case?

    Mike

    On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <peterlkeegan@gmail.com>
    wrote:
    I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The
    Filter
    wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
    get only a subset of the expected results, even accounting for
    deletes.
    The
    index has 10 segments. In IndexSearcher->searchWithFilter, it looks
    like
    the
    scorer is advancing to the filter's docId, which is the index-wide
    value,
    but the scorer is using the segment-relative value. If I optimize the index,
    I get the expected results.
    Does this look like a bug?

    Peter
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Simon Willnauer at Dec 4, 2009 at 5:54 pm

    ---------- Forwarded message ----------
    From: Simon Willnauer <simon.willnauer@googlemail.com>
    Date: Fri, Dec 4, 2009 at 6:53 PM
    Subject: Re: searchWithFilter bug?
    To: Peter Keegan <peterlkeegan@gmail.com>
    Peter, since search is per segment you need to use the segment reader
    passed in during search to create you DocIdSet if you use absolute
    docID your filter will not work.
    Many filters don't need to be segment aware as they use the given
    reader to somehow generate the docIdSet like
    MultiTermQueryWrapperFiler. DistanceFilter (contrib/spatial) and its
    subclasses keep state internally to work with per-segment search.

    maybe this helps to understand:

    public static final class SimpleDocIdSetFilter extends Filter {
    private int docBase;
    private int[] docs;
    private int index;
    public SimpleDocIdSetFilter(int[] docs) {
    this.docs = docs;
    }
    @Override
    public DocIdSet getDocIdSet(IndexReader reader) {
    final OpenBitSet set = new OpenBitSet();
    final int limit = docBase+reader.maxDoc();
    for (;index < docs.length; index++) {
    final int docId = docs[index];
    if(docId > limit)
    break;
    set.set(docId-docBase);
    }
    docBase = limit;
    return set.isEmpty()?null:set;
    }
    }

    @Mike: maybe we should add a testcase / method in TestFilteredSearch
    that searches on more than one segment.

    simon

    On Fri, Dec 4, 2009 at 5:27 PM, Peter Keegan wrote:
    The filter is just a java.util.BitSet. I use the top level reader to create
    the filter, and call IndexSearcher.search (Query, Filter, HitCollector). So,
    there is no 'docBase' at this level of the api.

    Peter

    On Fri, Dec 4, 2009 at 11:01 AM, Simon Willnauer
    wrote:
    Peter, which filter do you use, do you respect the IndexReaders
    maxDoc() and the docBase?

    simon

    On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan <peterlkeegan@gmail.com>
    wrote:
    I think the Filter's docIdSetIterator is using the top level reader for
    each
    segment, because the cardinality of the DocIdSet from which it's created
    is
    the same for all readers (and what I expect to see at the top level.

    Peter

    On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    That doesn't sound good.

    Though, in searchWithFilter, we seem to ask for the Query's scorer,
    and the Filter's docIdSetIterator, using the same reader (which may be
    toplevel, for the legacy case, or per-segment, for the normal case).
    So I'm not [yet] seeing where the issue is...

    Can you boil it down to a smallish test case?

    Mike

    On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <peterlkeegan@gmail.com>
    wrote:
    I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The
    Filter
    wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
    get only a subset of the expected results, even accounting for
    deletes. The
    index has 10 segments. In IndexSearcher->searchWithFilter, it looks
    like the
    scorer is advancing to the filter's docId, which is the index-wide
    value,
    but the scorer is using the segment-relative value. If I optimize the index,
    I get the expected results.
    Does this look like a bug?

    Peter
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Dec 4, 2009 at 6:10 pm

    On Fri, Dec 4, 2009 at 12:53 PM, Simon Willnauer wrote:

    @Mike: maybe we should add a testcase / method in TestFilteredSearch
    that searches on more than one segment.
    I agree, we should -- wanna cough up a patch?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Simon Willnauer at Dec 4, 2009 at 6:30 pm

    On Fri, Dec 4, 2009 at 7:09 PM, Michael McCandless wrote:
    On Fri, Dec 4, 2009 at 12:53 PM, Simon Willnauer
    wrote:
    @Mike: maybe we should add a testcase / method in TestFilteredSearch
    that searches on more than one segment.
    Working on it... will open an issue in a bit.
    I agree, we should -- wanna cough up a patch?

    Mike
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 4, '09 at 3:33p
activeDec 4, '09 at 6:30p
posts8
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase