FAQ
Hi,

I have a custom Query class that provides a long list of lucene docIds (not for filtering purposes), which is one clause in a standard BooleanQuery (which also contains TermQuery instances).

I have a custom Scorer that goes along with the custom Query class.

What (if any) document ordering requirements does the Scorer class have for its skipTo(int docId) method?

In particular, currently I'm sorting/returning the docIds in ascending order from my custom Query class. That can be expensive for large docId lists; is sorting necessary? It looks like skipTo() might expect the documents it gets to be in ascending order to behave correctly as part of a BooleanQuery, but I can't tell for sure from the doc.

If the document list from my custom Scorer class does not have its document list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses skipTo() potentially lose hits? If not, is there any performance concern with having the docIds unordered?


____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7

Search Discussions

  • Paul Elschot at Oct 4, 2007 at 5:36 pm
    Dan,

    In Scorers, when skipTo() or next() returns true for the second or later
    time, the result of doc() will be increased.
    When Scorer.skipTo() does not have document order, documents will
    be lost, which means that not all matching documents will be found
    by the search.

    For disjunctions (OR), one needs to merge the documents of
    two Scorers using next() to iterate over the documents.
    The merging is normally done on the fly using a specialized priority queue
    on the doc() values in DisjunctionSumScorer.
    No sorting of complete document lists is done at search time,
    that is done at indexing time. And since TermScorer uses the
    index directly, it will always return documents in order.

    The only exception to document ordering is BooleanScorer.next(),
    which is used by BooleanQuery for some cases of top
    level disjunctions, and then only when documents are allowed
    to be scored out of order. The reason for that is performance,
    BooleanScorer uses a faster data structure than a priority queue,
    but BooleanScorer does not implement skipTo().

    Regards,
    Paul Elschot



    On Thursday 04 October 2007 09:12, Dan Rich wrote:
    Hi,

    I have a custom Query class that provides a long list of lucene docIds (not
    for filtering purposes), which is one clause in a standard BooleanQuery
    (which also contains TermQuery instances).

    I have a custom Scorer that goes along with the custom Query class.

    What (if any) document ordering requirements does the Scorer class have for
    its skipTo(int docId) method?

    In particular, currently I'm sorting/returning the docIds in ascending
    order from my custom Query class. That can be expensive for large docId
    lists; is sorting necessary? It looks like skipTo() might expect the
    documents it gets to be in ascending order to behave correctly as part of a
    BooleanQuery, but I can't tell for sure from the doc.

    If the document list from my custom Scorer class does not have its document
    list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses
    skipTo() potentially lose hits? If not, is there any performance concern
    with having the docIds unordered?



    ___________________________________________________________________________
    _________ Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s
    user panel and lay it on us.
    http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 4, '07 at 7:12a
activeOct 4, '07 at 5:36p
posts2
users2
websitelucene.apache.org

2 users in discussion

Dan Rich: 1 post Paul Elschot: 1 post

People

Translate

site design / logo © 2022 Grokbase