FAQ
hey

I've got a filter that's storing document id's with a geo distance for
spatial lucene using a bitset position for doc id,
However with a MultiSegmentReader that's no longer going to working.

What's the most appropriate way to go from bitset position to doc id now?

Thanks
Patrick

Search Discussions

  • Uwe Schindler at Apr 28, 2009 at 8:42 pm
    What is the problem exactly? Maybe you use the new Collector API, where the
    search is done for each segment, so caching does not work correctly?



    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    _____

    From: patrick o'leary
    Sent: Tuesday, April 28, 2009 10:31 PM
    To: java-dev@lucene.apache.org
    Subject: ReadOnlyMultiSegmentReader bitset id vs doc id



    hey

    I've got a filter that's storing document id's with a geo distance for
    spatial lucene using a bitset position for doc id,
    However with a MultiSegmentReader that's no longer going to working.

    What's the most appropriate way to go from bitset position to doc id now?

    Thanks
    Patrick
  • Mark Miller at Apr 28, 2009 at 9:12 pm
    You might check out this Solr exchange :
    http://www.lucidimagination.com/search/document/b2ccc68ca834129/lucene_2_9_migration_issues_multireader_vs_indexreader_document_ids

    There are a few suggestions throughout.


    --
    - Mark

    http://www.lucidimagination.com



    Uwe Schindler wrote:
    What is the problem exactly? Maybe you use the new Collector API,
    where the search is done for each segment, so caching does not work
    correctly?



    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    ------------------------------------------------------------------------

    *From:* patrick o'leary
    *Sent:* Tuesday, April 28, 2009 10:31 PM
    *To:* java-dev@lucene.apache.org
    *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id



    hey

    I've got a filter that's storing document id's with a geo distance for
    spatial lucene using a bitset position for doc id,
    However with a MultiSegmentReader that's no longer going to working.

    What's the most appropriate way to go from bitset position to doc id now?

    Thanks
    Patrick




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Patrick o'leary at Apr 28, 2009 at 9:45 pm
    Think I may have found it, it was multiple runs of the filter, one for each
    segment reader, I was generating a new map to hold distances each time. So
    only the distances from the
    last segment reader were stored.

    Currently it looks like those segmented searches are done serially, well in
    solr they are-
    I presume the end goal is to make them multi-threaded ?
    I'll need to make my map synchronized

    On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler wrote:

    What is the problem exactly? Maybe you use the new Collector API, where
    the search is done for each segment, so caching does not work correctly?



    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    ------------------------------

    *From:* patrick o'leary
    *Sent:* Tuesday, April 28, 2009 10:31 PM
    *To:* java-dev@lucene.apache.org
    *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id



    hey

    I've got a filter that's storing document id's with a geo distance for
    spatial lucene using a bitset position for doc id,
    However with a MultiSegmentReader that's no longer going to working.

    What's the most appropriate way to go from bitset position to doc id now?

    Thanks
    Patrick
  • Mark Miller at Apr 28, 2009 at 11:17 pm
    I'm not sure that we could parallelize it. Currently, its a serial
    process (as you say) - the queue collects across readers by adjusting
    the values in the queue to sort correctly against the current reader.
    That approach doesn't appear easily parallelized.

    patrick o'leary wrote:
    Think I may have found it, it was multiple runs of the filter, one for
    each segment reader, I was generating a new map to hold distances each
    time. So only the distances from the
    last segment reader were stored.

    Currently it looks like those segmented searches are done serially,
    well in solr they are-
    I presume the end goal is to make them multi-threaded ?
    I'll need to make my map synchronized


    On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler wrote:

    What is the problem exactly? Maybe you use the new Collector API,
    where the search is done for each segment, so caching does not
    work correctly?



    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    ------------------------------------------------------------------------

    *From:* patrick o'leary [mailto:pjaol@pjaol.com
    *Sent:* Tuesday, April 28, 2009 10:31 PM
    *To:* java-dev@lucene.apache.org *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id



    hey

    I've got a filter that's storing document id's with a geo distance
    for spatial lucene using a bitset position for doc id,
    However with a MultiSegmentReader that's no longer going to working.

    What's the most appropriate way to go from bitset position to doc
    id now?

    Thanks
    Patrick

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Patrick o'leary at Apr 29, 2009 at 4:17 am
    Ok finally with some pointers from Ryan, figured out the last problem.
    So as a note to anyone else who might encounter the same problems with
    multireader

    A) Directories can contain multiple segments and a reader for those segments
    B) Searches are replayed within each reader in a serial fashion **
    C) If utilizing FieldCache / BitSet or anything related to document position
    within a reader, and you need docId
    -- document id = (sum of previous reader maxdocs )+ bitset position

    e.g.
    int offset;
    int nextOffset;

    public DocIdSet getDocIdSet(IndexReader reader) {

    OpenBitSet bitset = new OpenBitSet(reader.maxDoc());
    offset += reader.maxDoc();
    for (int i =0; i reader.maxDoc(); i++) {
    .....
    .... filter stuff ....
    ....
    if ( good ) {
    bitset.set( i );

    int docId = i + nextOffset;
    ...........
    }
    }

    nextOffset += offset;
    .......
    }


    K, works time for sleep

    P

    On Tue, Apr 28, 2009 at 5:44 PM, patrick o'leary wrote:

    Think I may have found it, it was multiple runs of the filter, one for each
    segment reader, I was generating a new map to hold distances each time. So
    only the distances from the
    last segment reader were stored.

    Currently it looks like those segmented searches are done serially, well in
    solr they are-
    I presume the end goal is to make them multi-threaded ?
    I'll need to make my map synchronized

    On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler wrote:

    What is the problem exactly? Maybe you use the new Collector API, where
    the search is done for each segment, so caching does not work correctly?



    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    ------------------------------

    *From:* patrick o'leary
    *Sent:* Tuesday, April 28, 2009 10:31 PM
    *To:* java-dev@lucene.apache.org
    *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id



    hey

    I've got a filter that's storing document id's with a geo distance for
    spatial lucene using a bitset position for doc id,
    However with a MultiSegmentReader that's no longer going to working.

    What's the most appropriate way to go from bitset position to doc id now?

    Thanks
    Patrick

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedApr 28, '09 at 8:31p
activeApr 29, '09 at 4:17a
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase