FAQ
I'm attempting to switch Solr to use the new Collector framework to
get per-segment sorting and have been hitting some issues.
The latest is a function query log(val) which produces both NaN and
-Infinity values, which kill the TopScoreDocCollector (invalid docids
are produced).

results = {org.apache.lucene.search.ScoreDoc[7]@2039}
[0] = {org.apache.lucene.search.ScoreDoc@2042}"doc=0 score=2.0"
[1] = {org.apache.lucene.search.ScoreDoc@2043}"doc=4 score=1.39794"
[2] = {org.apache.lucene.search.ScoreDoc@2044}"doc=3 score=1.0"
[3] = {org.apache.lucene.search.ScoreDoc@2045}"doc=5 score=0.69897"
[4] = {org.apache.lucene.search.ScoreDoc@2046}"doc=1 score=-2000000.0"
[5] = {org.apache.lucene.search.ScoreDoc@2047}"doc=2147483647 score=-Infinity"
[6] = {org.apache.lucene.search.ScoreDoc@2048}"doc=2147483647 score=-Infinity"

So either we need to clarify the valid values for score() or we need
to change how the queue does comparisons so that this works again.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Shai Erera at May 26, 2009 at 1:52 pm
    We've decided in 1575 to pre-populate HitQueue with sentinel values with
    score = Float.NEG_INF, as we assumed these scores will not be produced. TSDC
    instantiates HitQueue with pre-filling turned on.

    Is NEG_INF a valid score for you?
    On Tue, May 26, 2009 at 4:48 PM, Yonik Seeley wrote:

    I'm attempting to switch Solr to use the new Collector framework to
    get per-segment sorting and have been hitting some issues.
    The latest is a function query log(val) which produces both NaN and
    -Infinity values, which kill the TopScoreDocCollector (invalid docids
    are produced).

    results = {org.apache.lucene.search.ScoreDoc[7]@2039}
    [0] = {org.apache.lucene.search.ScoreDoc@2042}"doc=0 score=2.0"
    [1] = {org.apache.lucene.search.ScoreDoc@2043}"doc=4 score=1.39794"
    [2] = {org.apache.lucene.search.ScoreDoc@2044}"doc=3 score=1.0"
    [3] = {org.apache.lucene.search.ScoreDoc@2045}"doc=5 score=0.69897"
    [4] = {org.apache.lucene.search.ScoreDoc@2046}"doc=1 score=-2000000.0"
    [5] = {org.apache.lucene.search.ScoreDoc@2047}"doc=2147483647
    score=-Infinity"
    [6] = {org.apache.lucene.search.ScoreDoc@2048}"doc=2147483647
    score=-Infinity"

    So either we need to clarify the valid values for score() or we need
    to change how the queue does comparisons so that this works again.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at May 26, 2009 at 3:46 pm

    On Tue, May 26, 2009 at 9:52 AM, Shai Erera wrote:
    We've decided in 1575 to pre-populate HitQueue with sentinel values with
    score = Float.NEG_INF, as we assumed these scores will not be produced. TSDC
    instantiates HitQueue with pre-filling turned on.

    Is NEG_INF a valid score for you?
    It was.... people are free to create whatever functions they want. We
    never explicitly spelled out what happens when functions return -Inf
    or NaN, but everything still worked. Now we actually lose documents
    and they are replaced with invalid docids.

    To work around it, I've temporarily checked the score for NaN or
    -Infinity and replaced it with -MaxVal. I guess that will become
    permanent if we decide that those are not valid scores for scorers to
    produce.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at May 26, 2009 at 4:16 pm

    On Tue, May 26, 2009 at 9:48 AM, Yonik Seeley wrote:
    I'm attempting to switch Solr to use the new Collector framework to
    get per-segment sorting and have been hitting some issues.
    What other issues are you hitting?
    The latest is a function query log(val) which produces both NaN and
    -Infinity values, which kill the TopScoreDocCollector (invalid docids
    are produced).
    Is NEG_INF a valid score for you?
    It was.... people are free to create whatever functions they want. We
    never explicitly spelled out what happens when functions return -Inf
    or NaN, but everything still worked. Now we actually lose documents
    and they are replaced with invalid docids.
    Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
    (Nan, -Inf, in fact any non-positive score) were skipped entirely.
    To work around it, I've temporarily checked the score for NaN or
    -Infinity and replaced it with -MaxVal. I guess that will become
    permanent if we decide that those are not valid scores for scorers to
    produce.
    I think we should state that Nan/Inf/-Inf are not valid scores?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at May 26, 2009 at 4:29 pm

    On Tue, May 26, 2009 at 12:16 PM, Michael McCandless wrote:
    What other issues are you hitting?
    I hit an NPE when using old-style sort comparators.
    The main thing is that I didn't anticipate having to rewrite all the
    custom sort comparators Solr has.
    There are multiple test cases still failing... not sure at this point
    if they are all related to custom sort comparators or not.
    Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
    (Nan, -Inf, in fact any non-positive score) were skipped entirely.
    OK... I never actually checked if they were included or not - Solr
    used a priority queue directly due to historical Lucene limitations.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at May 26, 2009 at 4:31 pm
    FYI, the upgrading work is going on in
    https://issues.apache.org/jira/browse/SOLR-1111

    -Yonik
    http://www.lucidimagination.com



    On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
    wrote:
    On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
    wrote:
    What other issues are you hitting?
    I hit an NPE when using old-style sort comparators.
    The main thing is that I didn't anticipate having to rewrite all the
    custom sort comparators Solr has.
    There are multiple test cases still failing... not sure at this point
    if they are all related to custom sort comparators or not.
    Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
    (Nan, -Inf, in fact any non-positive score) were skipped entirely.
    OK... I never actually checked if they were included or not  - Solr
    used a priority queue directly due to historical Lucene limitations.

    -Yonik
    http://www.lucidimagination.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Shai Erera at May 26, 2009 at 5:35 pm
    I also think we should state that for TSDC, those scores are invalid.

    In fact, we also changed TopDocs to return maxScore=NaN if max-score is not
    tracked, so I think in a sense we've already said that NaN is not a valid
    value.

    But anyway, given that I don't believe those scores are common, and the
    enhancements we could do to TSDC (pre-populating the queue with those
    sentinels) I think we should state it for TSDC, and if someone does need to
    use those scores, he can write his own version of TSDC, which does not
    pre-populate anything, or uses different sentinels.

    I don't think this warrants an issue though. Just a small addition to TSDC's
    javadoc and to CHANGES.

    Shai
    On Tue, May 26, 2009 at 7:30 PM, Yonik Seeley wrote:

    FYI, the upgrading work is going on in
    https://issues.apache.org/jira/browse/SOLR-1111

    -Yonik
    http://www.lucidimagination.com



    On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
    wrote:
    On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
    wrote:
    What other issues are you hitting?
    I hit an NPE when using old-style sort comparators.
    The main thing is that I didn't anticipate having to rewrite all the
    custom sort comparators Solr has.
    There are multiple test cases still failing... not sure at this point
    if they are all related to custom sort comparators or not.
    Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
    (Nan, -Inf, in fact any non-positive score) were skipped entirely.
    OK... I never actually checked if they were included or not - Solr
    used a priority queue directly due to historical Lucene limitations.

    -Yonik
    http://www.lucidimagination.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at May 26, 2009 at 5:44 pm
    OK I'll commit these changes...

    Mike
    On Tue, May 26, 2009 at 1:34 PM, Shai Erera wrote:
    I also think we should state that for TSDC, those scores are invalid.

    In fact, we also changed TopDocs to return maxScore=NaN if max-score is not
    tracked, so I think in a sense we've already said that NaN is not a valid
    value.

    But anyway, given that I don't believe those scores are common, and the
    enhancements we could do to TSDC (pre-populating the queue with those
    sentinels) I think we should state it for TSDC, and if someone does need to
    use those scores, he can write his own version of TSDC, which does not
    pre-populate anything, or uses different sentinels.

    I don't think this warrants an issue though. Just a small addition to TSDC's
    javadoc and to CHANGES.

    Shai
    On Tue, May 26, 2009 at 7:30 PM, Yonik Seeley wrote:

    FYI, the upgrading work is going on in
    https://issues.apache.org/jira/browse/SOLR-1111

    -Yonik
    http://www.lucidimagination.com



    On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
    wrote:
    On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
    wrote:
    What other issues are you hitting?
    I hit an NPE when using old-style sort comparators.
    The main thing is that I didn't anticipate having to rewrite all the
    custom sort comparators Solr has.
    There are multiple test cases still failing... not sure at this point
    if they are all related to custom sort comparators or not.
    Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
    (Nan, -Inf, in fact any non-positive score) were skipped entirely.
    OK... I never actually checked if they were included or not  - Solr
    used a priority queue directly due to historical Lucene limitations.

    -Yonik
    http://www.lucidimagination.com
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at May 26, 2009 at 5:44 pm

    On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley wrote:
    On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
    wrote:
    What other issues are you hitting?
    I hit an NPE when using old-style sort comparators.
    Do you have the traceback (or remember the gist)?
    The main thing is that I didn't anticipate having to rewrite all the
    custom sort comparators Solr has.
    Because they were loading the FieldCache entry for the entire
    MultiReader (vs normal field sorting which'd load per segment)?

    How many custom sort comparators does Solr have?
    There are multiple test cases still failing... not sure at this point
    if they are all related to custom sort comparators or not.
    Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
    (Nan, -Inf, in fact any non-positive score) were skipped entirely.
    OK... I never actually checked if they were included or not  - Solr
    used a priority queue directly due to historical Lucene limitations.
    But are you now migrating away from Solr's private PQ? (Else you
    wouldn't have hit problems w/ Lucene's new TSDC).

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at May 26, 2009 at 5:59 pm

    On Tue, May 26, 2009 at 1:44 PM, Michael McCandless wrote:
    On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
    wrote:
    On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
    wrote:
    What other issues are you hitting?
    I hit an NPE when using old-style sort comparators.
    Do you have the traceback (or remember the gist)?
    I remember it...
    case SortField.CUSTOM:
    assert factory == null && comparatorSource != null;
    return comparatorSource.newComparator(field, numHits, sortPos, reversed);

    comparatorSource was null.

    The main thing is that I didn't anticipate having to rewrite all the
    custom sort comparators Solr has.
    Because they were loading the FieldCache entry for the entire
    MultiReader (vs normal field sorting which'd load per segment)?
    I think that could be it... I didn't trouble myself too long analyzing
    it (or the NPE above) - I just figured we should bite the bullet and
    start using non-deprecated classes.
    How many custom sort comparators does Solr have?
    I guess it's only 3 or 4... plus some code to use them to correctly
    merge results in distributed search.

    [...]
    But are you now migrating away from Solr's private PQ?  (Else you
    wouldn't have hit problems w/ Lucene's new TSDC).
    Yep.
    BTW, Kudos to everyone who worked on the new Collector classes... esp
    TopFieldCollector.create() - nice powerful stuff, easy to chain with
    other collectors, etc.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at May 26, 2009 at 6:22 pm

    On Tue, May 26, 2009 at 1:58 PM, Yonik Seeley wrote:
    What other issues are you hitting?
    I hit an NPE when using old-style sort comparators.
    Do you have the traceback (or remember the gist)?
    I remember it...
    case SortField.CUSTOM:
    assert factory == null && comparatorSource != null;
    return comparatorSource.newComparator(field, numHits, sortPos, reversed);

    comparatorSource was null.
    Hmm -- IndexSearcher tries to detect when SortComparatorSource is
    used, and drive the search with the toplevel reader, so that code is
    not supposed to be reached. Do you remember what tickled it?
    The main thing is that I didn't anticipate having to rewrite all the
    custom sort comparators Solr has.
    Because they were loading the FieldCache entry for the entire
    MultiReader (vs normal field sorting which'd load per segment)?
    I think that could be it... I didn't trouble myself too long analyzing
    it (or the NPE above) - I just figured we should bite the bullet and
    start using non-deprecated classes.
    OK
    How many custom sort comparators does Solr have?
    I guess it's only 3 or 4... plus some code to use them to correctly
    merge results in distributed search.
    OK
    But are you now migrating away from Solr's private PQ?  (Else you
    wouldn't have hit problems w/ Lucene's new TSDC).
    Yep. Super!
    BTW, Kudos to everyone who worked on the new Collector classes... esp
    TopFieldCollector.create() - nice powerful stuff, easy to chain with
    other collectors, etc.
    Shai did all the hard work ;) (And, still is...)

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at May 26, 2009 at 6:42 pm

    On Tue, May 26, 2009 at 2:22 PM, Michael McCandless wrote:
    Hmm -- IndexSearcher tries to detect when SortComparatorSource is
    used, and drive the search with the toplevel reader, so that code is
    not supposed to be reached.  Do you remember what tickled it?
    Solr's search code is now using the IndexSearcher.search(query,
    luceneFilter, collector) method.
    Since it doesn't pass a Sort (it's part of the collector instead), I
    guess this logic is bypassed.
    From a back compat point of view, this is fine of course since
    "Collector" didn't previously exist.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at May 26, 2009 at 6:50 pm

    On Tue, May 26, 2009 at 2:41 PM, Yonik Seeley wrote:
    On Tue, May 26, 2009 at 2:22 PM, Michael McCandless
    wrote:
    Hmm -- IndexSearcher tries to detect when SortComparatorSource is
    used, and drive the search with the toplevel reader, so that code is
    not supposed to be reached.  Do you remember what tickled it?
    Solr's search code is now using the IndexSearcher.search(query,
    luceneFilter, collector) method.
    Since it doesn't pass a Sort (it's part of the collector instead), I
    guess this logic is bypassed.
    From a back compat point of view, this is fine of course since
    "Collector" didn't previously exist.
    Ahhh OK, phew :)

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedMay 26, '09 at 1:48p
activeMay 26, '09 at 6:50p
posts13
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase