FAQ
Hi!

I'm using *SOLR 4.4.0* for searching in my project.
Now I am facing a problem of atomic updates in multiple cores.
From wiki:
curl *http://localhost:8983/solr/update
<http://localhost:8983/solr/update> *-H
'Content-type:application/json' -d '
[
  {
   "*id*" : "*TestDoc1*",
   "title" : {"set":"test1"},
   "revision" : {"inc":3},
   "publisher" : {"add":"TestPublisher"}
  },
  {
   "id" : "TestDoc2",
   "publisher" : {"add":"TestPublisher"}
  }
]'

As well as I understand, this means that the document, for example, with id
*TestDoc1*, will be searched for updating *only in one core*.
And if there is no any document with id *TestDoc1*, the document will be
created.
Can I somehow to specify the* list of cores* for searching and then
updating necessary document with specific id?

It's something like *shards *parameter in *select* query.
From wiki:
#now do a distributed search across both servers with your browser or curl
curl 'http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'

Or is it planned in the future?

Thanks in advance.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Best regards,
Batalova Kseniya

Search Discussions

  • Jack Krupansky at Jun 2, 2015 at 4:01 pm
    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:

    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky
    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова wrote:

    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Ксения Баталова at Jun 3, 2015 at 12:09 pm
    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:
    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky
    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова wrote:

    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Upayavira at Jun 3, 2015 at 2:23 pm
    If you are using stand-alone Solr instances, then it is your
    responsibility to decide which node a document resides in, and thus to
    which core you will send your update request.

    If, however, you used SolrCloud, it would handle that for you - deciding
    which node should contain a document, and directing the update their all
    behind the scenes for you.

    Upayavira
    On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several
    cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:
    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky
    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова wrote:

    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Jack Krupansky at Jun 3, 2015 at 2:39 pm
    Explain a little about why you have separate cores, and how you decide
    which core a new document should reside in. Your scenario still seems a bit
    odd, so help us understand.


    -- Jack Krupansky
    On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова wrote:

    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:

    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky
    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова wrote:

    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Ксения Баталова at Jun 3, 2015 at 6:08 pm
    Jack,

    Decision of using several cores was made to increase indexing and
    searching performance (experimentally).

    In my project index is about 300-500 millions documents (each document
    has rather difficult structure) and it may be larger.

    So, while indexing the documents are being added in different cores by
    some amount of threads.

    In other words, each thread collect nessesary information for list of
    documents and generate create-documents query to specific core.

    At this moment it doesn't matter (and it can't be found out) which
    document in which core will be.

    And now there is necessary to update (atomic update) this index.

    Something like this..

    _ _

    Batalova Kseniya


    Explain a little about why you have separate cores, and how you decide
    which core a new document should reside in. Your scenario still seems a bit
    odd, so help us understand.


    -- Jack Krupansky
    On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова wrote:

    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:

    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky
    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова wrote:

    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Erick Erickson at Jun 3, 2015 at 6:17 pm
    I have to ask then why you're not using SolrCloud with multiple shards? It
    seems to me that that gives you the indexing throughput you need (be sure to
    use CloudSolrServer from your client). At 300M complex documents, you
    pretty much certainly will need to shard anyway so in some sense you're
    re-inventing the wheel here.

    You can host multiple shards on the same machine, and these _are_ separate
    Solr cores under the covers so you problem with atomic updates disappears.

    Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
    voted on even now and should be out in a week or so barring problems).

    Best,
    Erick
    On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова wrote:
    Jack,

    Decision of using several cores was made to increase indexing and
    searching performance (experimentally).

    In my project index is about 300-500 millions documents (each document
    has rather difficult structure) and it may be larger.

    So, while indexing the documents are being added in different cores by
    some amount of threads.

    In other words, each thread collect nessesary information for list of
    documents and generate create-documents query to specific core.

    At this moment it doesn't matter (and it can't be found out) which
    document in which core will be.

    And now there is necessary to update (atomic update) this index.

    Something like this..

    _ _

    Batalova Kseniya


    Explain a little about why you have separate cores, and how you decide
    which core a new document should reside in. Your scenario still seems a bit
    odd, so help us understand.


    -- Jack Krupansky
    On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова wrote:

    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:

    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky

    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova.ks@gmail.com>
    wrote:
    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Ксения Баталова at Jun 3, 2015 at 6:24 pm
    Upayavira,

    I'm using stand-alone Solr instances.

    I've not learnt SolrCloud yet.

    Please, give me some advice when SolrCloud is better then stand-alone
    Solr instances.

    Or when it is worth to choose SolrCloud.

    _ _ _

    Batalova Kseniya


    If you are using stand-alone Solr instances, then it is your
    responsibility to decide which node a document resides in, and thus to
    which core you will send your update request.

    If, however, you used SolrCloud, it would handle that for you - deciding
    which node should contain a document, and directing the update their all
    behind the scenes for you.

    Upayavira
    On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several
    cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:
    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky
    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова wrote:

    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Erick Erickson at Jun 3, 2015 at 7:28 pm
    Basically, I think about using SolrCloud whenever you have to split
    your corpus into more than one core (shard in SolrCloud terms). Or
    when you require fault tolerance in terms of machines going up and
    down.

    Despite the name, it does _not_ require AWS or similar, and you can
    run "SolrCloud" on a single machine, that is host multiple shards on a
    single physical machine to take advantage of the many CPU cores often
    available on modern hardware. Or you can host your "SolrCloud" in your
    own data center. Or, really, anywhere that you have one or more
    machines available that can talk to each other.

    I _really_ recommend you look at this option before pursuing your
    original question, it's vastly easier to let SolrCloud handle your
    routing, queries etc. than re-invent all that yourself.

    Best,
    Erick
    On Wed, Jun 3, 2015 at 11:23 AM, Ксения Баталова wrote:
    Upayavira,

    I'm using stand-alone Solr instances.

    I've not learnt SolrCloud yet.

    Please, give me some advice when SolrCloud is better then stand-alone
    Solr instances.

    Or when it is worth to choose SolrCloud.

    _ _ _

    Batalova Kseniya


    If you are using stand-alone Solr instances, then it is your
    responsibility to decide which node a document resides in, and thus to
    which core you will send your update request.

    If, however, you used SolrCloud, it would handle that for you - deciding
    which node should contain a document, and directing the update their all
    behind the scenes for you.

    Upayavira
    On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several
    cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:
    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky

    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova.ks@gmail.com>
    wrote:
    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Jack Krupansky at Jun 3, 2015 at 8:19 pm
    BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster
    would make a lot more sense since a cloud is typically a very large
    collection of machines and more of a place than a specific configuration,
    while a Solr deployment is more typically a more modest number of machines,
    a cluster. It just seems totally out of sync with the current popular
    conception of a cloud, and it helps confuse people as to when and where
    people can use it. I think it must have occurred after the end of my tenure
    at Lucid (October 2011), because my recollection is that it was then just
    known as "distributed".

    -- Jack Krupansky
    On Wed, Jun 3, 2015 at 3:26 PM, Erick Erickson wrote:

    Basically, I think about using SolrCloud whenever you have to split
    your corpus into more than one core (shard in SolrCloud terms). Or
    when you require fault tolerance in terms of machines going up and
    down.

    Despite the name, it does _not_ require AWS or similar, and you can
    run "SolrCloud" on a single machine, that is host multiple shards on a
    single physical machine to take advantage of the many CPU cores often
    available on modern hardware. Or you can host your "SolrCloud" in your
    own data center. Or, really, anywhere that you have one or more
    machines available that can talk to each other.

    I _really_ recommend you look at this option before pursuing your
    original question, it's vastly easier to let SolrCloud handle your
    routing, queries etc. than re-invent all that yourself.

    Best,
    Erick
    On Wed, Jun 3, 2015 at 11:23 AM, Ксения Баталова wrote:
    Upayavira,

    I'm using stand-alone Solr instances.

    I've not learnt SolrCloud yet.

    Please, give me some advice when SolrCloud is better then stand-alone
    Solr instances.

    Or when it is worth to choose SolrCloud.

    _ _ _

    Batalova Kseniya


    If you are using stand-alone Solr instances, then it is your
    responsibility to decide which node a document resides in, and thus to
    which core you will send your update request.

    If, however, you used SolrCloud, it would handle that for you - deciding
    which node should contain a document, and directing the update their all
    behind the scenes for you.

    Upayavira
    On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several
    cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:
    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

    -- Jack Krupansky

    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova.ks@gmail.com
    wrote:
    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example,
    with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will
    be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or
    curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Shawn Heisey at Jun 3, 2015 at 10:46 pm

    On 6/3/2015 2:19 PM, Jack Krupansky wrote:
    BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster
    would make a lot more sense since a cloud is typically a very large
    collection of machines and more of a place than a specific configuration,
    while a Solr deployment is more typically a more modest number of machines,
    a cluster. It just seems totally out of sync with the current popular
    conception of a cloud, and it helps confuse people as to when and where
    people can use it. I think it must have occurred after the end of my tenure
    at Lucid (October 2011), because my recollection is that it was then just
    known as "distributed".
    This all happened before I was paying attention to any development stuff
    on Solr.

    The earliest mention I have found so far is this:

    https://issues.apache.org/jira/browse/SOLR-1873

    Here's the first revision of the SolrCloud wiki page that I can access:

    http://wiki.apache.org/solr/SolrCloud?action=recall&rev=1

    I can't find anything about the origins. I'd like to search the dev
    list for history, but I can't find anyplace where this list is
    searchable for the correct (2009-2010) timeframe.

    Possible origins that I have thought of:

    1) *Very* large clusters were envisioned. There are real SolrCloud
    installs consisting of hundreds of machines and billions of documents.
    That certainly qualifies for the "cloud" moniker.

    2) Somebody was interested in leveraging a hot buzzword, to help
    generate excitement and support for a new feature.

    Thanks,
    Shawn
  • Ксения Баталова at Jun 4, 2015 at 5:06 pm
    Erick,

    Thank you so much. It became a bit clearer.

    It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.

    I think I'll write here about it yet :)

    _ _

    Batalova Kseniya


    I have to ask then why you're not using SolrCloud with multiple shards? It
    seems to me that that gives you the indexing throughput you need (be sure to
    use CloudSolrServer from your client). At 300M complex documents, you
    pretty much certainly will need to shard anyway so in some sense you're
    re-inventing the wheel here.

    You can host multiple shards on the same machine, and these _are_ separate
    Solr cores under the covers so you problem with atomic updates disappears.

    Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
    voted on even now and should be out in a week or so barring problems).

    Best,
    Erick
    On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова wrote:
    Jack,

    Decision of using several cores was made to increase indexing and
    searching performance (experimentally).

    In my project index is about 300-500 millions documents (each document
    has rather difficult structure) and it may be larger.

    So, while indexing the documents are being added in different cores by
    some amount of threads.

    In other words, each thread collect nessesary information for list of
    documents and generate create-documents query to specific core.

    At this moment it doesn't matter (and it can't be found out) which
    document in which core will be.

    And now there is necessary to update (atomic update) this index.

    Something like this..

    _ _

    Batalova Kseniya


    Explain a little about why you have separate cores, and how you decide
    which core a new document should reside in. Your scenario still seems a bit
    odd, so help us understand.


    -- Jack Krupansky
    On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова wrote:

    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:

    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky

    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova.ks@gmail.com>
    wrote:
    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Erick Erickson at Jun 4, 2015 at 5:25 pm
    NP. It's something of a step when moving to SolrCloud to "let go" of the
    details you've had to (painfully) pay attention to, but worth it. The price is,
    of course, learning to do things a new way ;)...

    Best,
    Erick
    On Thu, Jun 4, 2015 at 10:04 AM, Ксения Баталова wrote:
    Erick,

    Thank you so much. It became a bit clearer.

    It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.

    I think I'll write here about it yet :)

    _ _

    Batalova Kseniya


    I have to ask then why you're not using SolrCloud with multiple shards? It
    seems to me that that gives you the indexing throughput you need (be sure to
    use CloudSolrServer from your client). At 300M complex documents, you
    pretty much certainly will need to shard anyway so in some sense you're
    re-inventing the wheel here.

    You can host multiple shards on the same machine, and these _are_ separate
    Solr cores under the covers so you problem with atomic updates disappears.

    Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
    voted on even now and should be out in a week or so barring problems).

    Best,
    Erick
    On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова wrote:
    Jack,

    Decision of using several cores was made to increase indexing and
    searching performance (experimentally).

    In my project index is about 300-500 millions documents (each document
    has rather difficult structure) and it may be larger.

    So, while indexing the documents are being added in different cores by
    some amount of threads.

    In other words, each thread collect nessesary information for list of
    documents and generate create-documents query to specific core.

    At this moment it doesn't matter (and it can't be found out) which
    document in which core will be.

    And now there is necessary to update (atomic update) this index.

    Something like this..

    _ _

    Batalova Kseniya


    Explain a little about why you have separate cores, and how you decide
    which core a new document should reside in. Your scenario still seems a bit
    odd, so help us understand.


    -- Jack Krupansky

    On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова <batalova.ks@gmail.com>
    wrote:
    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:

    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky

    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova.ks@gmail.com>
    wrote:
    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya
  • Ксения Баталова at Jun 4, 2015 at 6:17 pm
    Hope I'll succeed)

    Anyway, solr-user community surprised me in a good way.

    Thanks again.

    _ _

    Batalova Kseniya


    NP. It's something of a step when moving to SolrCloud to "let go" of the
    details you've had to (painfully) pay attention to, but worth it. The price is,
    of course, learning to do things a new way ;)...

    Best,
    Erick
    On Thu, Jun 4, 2015 at 10:04 AM, Ксения Баталова wrote:
    Erick,

    Thank you so much. It became a bit clearer.

    It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.

    I think I'll write here about it yet :)

    _ _

    Batalova Kseniya


    I have to ask then why you're not using SolrCloud with multiple shards? It
    seems to me that that gives you the indexing throughput you need (be sure to
    use CloudSolrServer from your client). At 300M complex documents, you
    pretty much certainly will need to shard anyway so in some sense you're
    re-inventing the wheel here.

    You can host multiple shards on the same machine, and these _are_ separate
    Solr cores under the covers so you problem with atomic updates disappears.

    Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
    voted on even now and should be out in a week or so barring problems).

    Best,
    Erick
    On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова wrote:
    Jack,

    Decision of using several cores was made to increase indexing and
    searching performance (experimentally).

    In my project index is about 300-500 millions documents (each document
    has rather difficult structure) and it may be larger.

    So, while indexing the documents are being added in different cores by
    some amount of threads.

    In other words, each thread collect nessesary information for list of
    documents and generate create-documents query to specific core.

    At this moment it doesn't matter (and it can't be found out) which
    document in which core will be.

    And now there is necessary to update (atomic update) this index.

    Something like this..

    _ _

    Batalova Kseniya


    Explain a little about why you have separate cores, and how you decide
    which core a new document should reside in. Your scenario still seems a bit
    odd, so help us understand.


    -- Jack Krupansky

    On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова <batalova.ks@gmail.com>
    wrote:
    Hi!

    Thanks for your quick reply.

    The problem that all my index is consists of several parts (several cores)

    and while updating I don't know in advance in which part updated id is
    lying (in which core the document with specified id is lying).

    For example, I have two cores (*Core1 *and *Core2*) and I want to
    update the document with id *Id1 *and I don't know where this document
    is lying.

    So, I have to do two select-queries to my cores to know where it is.

    And then generate update-query to necessary core.

    What am I doing wrong?

    I remind that I'm using SOLR 4.4.0.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    Best regards,
    Batalova Kseniya


    What exactly is the problem? And why do you care about cores, per se -
    other than to send the update to the core/collection you are trying to
    update? You should specify the core/collection name in the URL.

    You should also be using the Solr reference guide rather than the (old)
    wiki:

    https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


    -- Jack Krupansky

    On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova.ks@gmail.com>
    wrote:
    Hi!

    I'm using *SOLR 4.4.0* for searching in my project.
    Now I am facing a problem of atomic updates in multiple cores.
    From wiki:

    curl *http://localhost:8983/solr/update
    <http://localhost:8983/solr/update> *-H
    'Content-type:application/json' -d '
    [
    {
    "*id*" : "*TestDoc1*",
    "title" : {"set":"test1"},
    "revision" : {"inc":3},
    "publisher" : {"add":"TestPublisher"}
    },
    {
    "id" : "TestDoc2",
    "publisher" : {"add":"TestPublisher"}
    }
    ]'

    As well as I understand, this means that the document, for example, with id
    *TestDoc1*, will be searched for updating *only in one core*.
    And if there is no any document with id *TestDoc1*, the document will be
    created.
    Can I somehow to specify the* list of cores* for searching and then
    updating necessary document with specific id?

    It's something like *shards *parameter in *select* query.
    From wiki:

    #now do a distributed search across both servers with your browser or curl
    curl '
    http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
    '

    Or is it planned in the future?

    Thanks in advance.

    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    Best regards,
    Batalova Kseniya

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-user @
categorieslucene
postedJun 2, '15 at 2:26p
activeJun 4, '15 at 6:17p
posts14
users5
websitelucene.apache.org...

People

Translate

site design / logo © 2019 Grokbase