FAQ
Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
-----------------------------------------------------------------------------------------------

Key: SOLR-659
URL: https://issues.apache.org/jira/browse/SOLR-659
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 1.3
Reporter: Brian Whitman
Priority: Minor
Fix For: 1.3
Attachments: shards.start_rows.patch

The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.

Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.




--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Brian Whitman (JIRA) at Jul 25, 2008 at 2:07 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Brian Whitman updated SOLR-659:
    -------------------------------

    Attachment: shards.start_rows.patch

    Attaching patch.
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Priority: Minor
    Fix For: 1.3

    Attachments: shards.start_rows.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Brian Whitman (JIRA) at Jul 25, 2008 at 2:18 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616903#action_12616903 ]

    Brian Whitman commented on SOLR-659:
    ------------------------------------

    An example of a bulk query using this patch. Without this patch such bulk queries will eventually time out or cause exceptions in the server as too much data is passed back and forth.

    {code:java}
    public SolrDocumentList blockQuery(SolrQuery q, int blockSize, int maxResults) {
    SolrDocumentList allResults = new SolrDocumentList();
    if(blockSize > maxResults) { blockSize = maxResults; }
    for(int i=0; i<maxResults; i=i+blockSize) {
    // Sets rows of this query to the most results that could ever come back - the blockSize * the number of shards
    q.setRows(blockSize * getNumberOfHosts());
    // Don't set a start on the main query
    q.setStart(0);
    // But do set start and rows on the individual shards.
    q.set("shards.start", String.valueOf(i));
    q.set("shards.rows", String.valueOf(blockSize));
    // Perform the query.
    QueryResponse sub = query(q);
    // For each returned document (up to blockSize*numberOfHosts() of them), append them to the main result
    for(SolrDocument s : sub.getResults()) {
    allResults.add(s);
    // Break if we've reached our requested limit
    if(allResults.size() > maxResults) { break; }
    }
    if(allResults.size() > maxResults) { break; }
    }
    return allResults;
    }
    {code}
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Priority: Minor
    Fix For: 1.3

    Attachments: shards.start_rows.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mike Klaas (JIRA) at Jul 31, 2008 at 6:10 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Mike Klaas updated SOLR-659:
    ----------------------------

    Fix Version/s: (was: 1.3)

    IMO it is too late in the release process for new features.
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Priority: Minor
    Attachments: shards.start_rows.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Otis Gospodnetic (JIRA) at Sep 8, 2008 at 2:51 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Otis Gospodnetic updated SOLR-659:
    ----------------------------------

    Fix Version/s: 1.4

    This looks simple enough. I haven't tried it. Brian, do you have a unit test you could attach?

    Or would it make more sense to have a custom QueryComponent for something like this? (I don't know yet)

    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Priority: Minor
    Fix For: 1.4

    Attachments: shards.start_rows.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Brian Whitman (JIRA) at Feb 8, 2009 at 5:55 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Brian Whitman updated SOLR-659:
    -------------------------------

    Attachment: SOLR-659.patch

    New patch syncs w/ trunk
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Priority: Minor
    Fix For: 1.4

    Attachments: shards.start_rows.patch, SOLR-659.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shalin Shekhar Mangar (JIRA) at Mar 20, 2009 at 9:03 am
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683803#action_12683803 ]

    Shalin Shekhar Mangar commented on SOLR-659:
    --------------------------------------------

    If I understand this correctly, it makes bulk queries cheaper at the expense of less precise scoring. But if I'm paging through some results and you modify the shard.start and shard.rows then I'll get inconsistent results. Is that correct?

    bq. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.

    I do not understand that. Why will the client get more than rows? Or by client, did you mean the solr server to which the initial request is sent?
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Priority: Minor
    Fix For: 1.4

    Attachments: shards.start_rows.patch, SOLR-659.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Yonik Seeley (JIRA) at Aug 28, 2009 at 9:29 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Yonik Seeley reassigned SOLR-659:
    ---------------------------------

    Assignee: Yonik Seeley
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Assignee: Yonik Seeley
    Priority: Minor
    Fix For: 1.4

    Attachments: shards.start_rows.patch, SOLR-659.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Yonik Seeley (JIRA) at Aug 28, 2009 at 9:30 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748986#action_12748986 ]

    Yonik Seeley commented on SOLR-659:
    -----------------------------------

    I agree this makes sense to enable efficient bulk operations, and also fits in with a past idea I had about mapping shards.param=foo to param=foo during a sub-request.

    I'll give it a couple of days and commit if there are no objections.
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Priority: Minor
    Fix For: 1.4

    Attachments: shards.start_rows.patch, SOLR-659.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Yonik Seeley (JIRA) at Sep 7, 2009 at 6:32 pm
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Yonik Seeley resolved SOLR-659.
    -------------------------------

    Resolution: Fixed

    Thanks Brian, I just committed this.
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Assignee: Yonik Seeley
    Priority: Minor
    Fix For: 1.4

    Attachments: shards.start_rows.patch, SOLR-659.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • johnson.hong (JIRA) at Oct 23, 2009 at 7:02 am
    [ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769114#action_12769114 ]

    johnson.hong commented on SOLR-659:
    -----------------------------------

    This is really helpful to bulk queries ,but how to handle the pagination of query results.
    e.g.at the first query,I set shards.start to 0 and set shards.rows to 30,it may return 50 documents,and i get 30 documents to show ,the other 20 documents is discarded ;then how to get the next 30 documents ?
    Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr
    -----------------------------------------------------------------------------------------------

    Key: SOLR-659
    URL: https://issues.apache.org/jira/browse/SOLR-659
    Project: Solr
    Issue Type: Improvement
    Components: search
    Affects Versions: 1.3
    Reporter: Brian Whitman
    Assignee: Yonik Seeley
    Priority: Minor
    Fix For: 1.4

    Attachments: shards.start_rows.patch, SOLR-659.patch


    The default behavior of setting start and rows on distributed solr (SOLR-303) is to set start at 0 across all shards and set rows to start+rows across each shard. This ensures all results are returned for any arbitrary start and rows setting, but during "bulk queries" (where start is incrementally increased and rows is kept consistent) the client would need finer control of the per-shard start and rows parameter as retrieving many thousands of documents becomes intractable as start grows higher.
    Attaching a patch that creates a &shards.start and &shards.rows parameter. If used, the logic that sets rows to start+rows per shard is overridden and each shard gets the exact start and rows set in shards.start and shards.rows. The client will receive up to shards.rows * nShards results and should set rows accordingly. This makes bulk queries across distributed solr possible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-dev @
categorieslucene
postedJul 25, '08 at 2:07p
activeOct 23, '09 at 7:02a
posts11
users1
websitelucene.apache.org...

1 user in discussion

johnson.hong (JIRA): 11 posts

People

Translate

site design / logo © 2019 Grokbase