FAQ
Hi all,

My team at work maintains a SolrCloud 5.3.2 cluster with multiple
collections configured with sharding and replication.

We recently backed up our Solr indexes using the built-in backup
functionality. After the cluster was restored from the backup, we
noticed that atomic updates of documents are failing occasionally with
the error message 'missing required field [...]'. The exceptions are
thrown on a host on which the document to be updated is not stored. From
this we are deducing that there is a problem with finding the right host
by the hash of the uniqueKey. Indeed, our investigations so far showed
that for at least one collection in the new cluster, the shards have
different hash ranges assigned now. We checked the hash ranges by
querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
hash ranges of one collection that we debugged.

   Old cluster:
     shard1_0 80000000 - aaa9ffff
     shard1_1 aaaa0000 - d554ffff
     shard2_0 d5550000 - fffeffff
     shard2_1 ffff0000 - 2aa9ffff
     shard3_0 2aaa0000 - 5554ffff
     shard3_1 55550000 - 7fffffff

   New cluster:
     shard1 80000000 - aaa9ffff
     shard2 aaaa0000 - d554ffff
     shard3 d5550000 - ffffffff
     shard4 0 - 2aa9ffff
     shard5 2aaa0000 - 5554ffff
     shard6 55550000 - 7fffffff

   Note that the shard names differ because the old cluster's shards were
   split.

As you can see, the ranges of shard3 and shard4 differ from the old
cluster. This change of hash ranges matches with the symptoms we are
currently experiencing.

We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
in which David Smiley comments:

   shard hash ranges aren't restored; this error could be disasterous

It seems that this is what happened to us. We would like to hear some
suggestions on how we could recover from this problem.

Best,
Gary

Search Discussions

  • Erick Erickson at Jun 15, 2016 at 5:24 pm
    Simplest, though a bit risky is to manually edit the znode and
    correct the znode entry. There are various tools out there, including
    one that ships with Zookeeper (see the ZK documentation).

    Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
    down to your local machine, edit it there and then push it back up to ZK.

    I'd do all this with my Solr nodes shut down, then insure that my ZK
    ensemble was consistent after the update etc....

    Best,
    Erick
    On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao wrote:
    Hi all,

    My team at work maintains a SolrCloud 5.3.2 cluster with multiple
    collections configured with sharding and replication.

    We recently backed up our Solr indexes using the built-in backup
    functionality. After the cluster was restored from the backup, we
    noticed that atomic updates of documents are failing occasionally with
    the error message 'missing required field [...]'. The exceptions are
    thrown on a host on which the document to be updated is not stored. From
    this we are deducing that there is a problem with finding the right host
    by the hash of the uniqueKey. Indeed, our investigations so far showed
    that for at least one collection in the new cluster, the shards have
    different hash ranges assigned now. We checked the hash ranges by
    querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
    hash ranges of one collection that we debugged.

    Old cluster:
    shard1_0 80000000 - aaa9ffff
    shard1_1 aaaa0000 - d554ffff
    shard2_0 d5550000 - fffeffff
    shard2_1 ffff0000 - 2aa9ffff
    shard3_0 2aaa0000 - 5554ffff
    shard3_1 55550000 - 7fffffff

    New cluster:
    shard1 80000000 - aaa9ffff
    shard2 aaaa0000 - d554ffff
    shard3 d5550000 - ffffffff
    shard4 0 - 2aa9ffff
    shard5 2aaa0000 - 5554ffff
    shard6 55550000 - 7fffffff

    Note that the shard names differ because the old cluster's shards were
    split.

    As you can see, the ranges of shard3 and shard4 differ from the old
    cluster. This change of hash ranges matches with the symptoms we are
    currently experiencing.

    We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
    in which David Smiley comments:

    shard hash ranges aren't restored; this error could be disasterous

    It seems that this is what happened to us. We would like to hear some
    suggestions on how we could recover from this problem.

    Best,
    Gary
  • Gary Yao at Jun 16, 2016 at 12:26 pm
    Hi Erick,

    I should add that our Solr cluster is in production and new documents
    are constantly indexed. The new cluster has been up for three weeks now.
    The problem was discovered only now because in our use case Atomic
    Updates and RealTime Gets are mostly performed on new documents. With
    almost absolute certainty there are already documents in the index that
    were distributed to the shards according to the new hash ranges. If we
    just changed the hash ranges in ZooKeeper, the index would still be in
    an inconsistent state.

    Is there any way to recover from this without having to re-index all
    documents?

    Best,
    Gary

    2016-06-15 19:23 GMT+02:00 Erick Erickson <erickerickson@gmail.com>:
    Simplest, though a bit risky is to manually edit the znode and
    correct the znode entry. There are various tools out there, including
    one that ships with Zookeeper (see the ZK documentation).

    Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
    down to your local machine, edit it there and then push it back up to ZK.

    I'd do all this with my Solr nodes shut down, then insure that my ZK
    ensemble was consistent after the update etc....

    Best,
    Erick
    On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao wrote:
    Hi all,

    My team at work maintains a SolrCloud 5.3.2 cluster with multiple
    collections configured with sharding and replication.

    We recently backed up our Solr indexes using the built-in backup
    functionality. After the cluster was restored from the backup, we
    noticed that atomic updates of documents are failing occasionally with
    the error message 'missing required field [...]'. The exceptions are
    thrown on a host on which the document to be updated is not stored. From
    this we are deducing that there is a problem with finding the right host
    by the hash of the uniqueKey. Indeed, our investigations so far showed
    that for at least one collection in the new cluster, the shards have
    different hash ranges assigned now. We checked the hash ranges by
    querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
    hash ranges of one collection that we debugged.

    Old cluster:
    shard1_0 80000000 - aaa9ffff
    shard1_1 aaaa0000 - d554ffff
    shard2_0 d5550000 - fffeffff
    shard2_1 ffff0000 - 2aa9ffff
    shard3_0 2aaa0000 - 5554ffff
    shard3_1 55550000 - 7fffffff

    New cluster:
    shard1 80000000 - aaa9ffff
    shard2 aaaa0000 - d554ffff
    shard3 d5550000 - ffffffff
    shard4 0 - 2aa9ffff
    shard5 2aaa0000 - 5554ffff
    shard6 55550000 - 7fffffff

    Note that the shard names differ because the old cluster's shards were
    split.

    As you can see, the ranges of shard3 and shard4 differ from the old
    cluster. This change of hash ranges matches with the symptoms we are
    currently experiencing.

    We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
    in which David Smiley comments:

    shard hash ranges aren't restored; this error could be disasterous

    It seems that this is what happened to us. We would like to hear some
    suggestions on how we could recover from this problem.

    Best,
    Gary
  • Erick Erickson at Jun 16, 2016 at 11:46 pm
    In essence, no. The data is, at best, in the wrong shard and at worst
    nowhere.

    Sent from my phone
    On Jun 16, 2016 8:26 AM, "Gary Yao" wrote:

    Hi Erick,

    I should add that our Solr cluster is in production and new documents
    are constantly indexed. The new cluster has been up for three weeks now.
    The problem was discovered only now because in our use case Atomic
    Updates and RealTime Gets are mostly performed on new documents. With
    almost absolute certainty there are already documents in the index that
    were distributed to the shards according to the new hash ranges. If we
    just changed the hash ranges in ZooKeeper, the index would still be in
    an inconsistent state.

    Is there any way to recover from this without having to re-index all
    documents?

    Best,
    Gary

    2016-06-15 19:23 GMT+02:00 Erick Erickson <erickerickson@gmail.com>:
    Simplest, though a bit risky is to manually edit the znode and
    correct the znode entry. There are various tools out there, including
    one that ships with Zookeeper (see the ZK documentation).

    Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
    down to your local machine, edit it there and then push it back up to ZK.

    I'd do all this with my Solr nodes shut down, then insure that my ZK
    ensemble was consistent after the update etc....

    Best,
    Erick
    On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao wrote:
    Hi all,

    My team at work maintains a SolrCloud 5.3.2 cluster with multiple
    collections configured with sharding and replication.

    We recently backed up our Solr indexes using the built-in backup
    functionality. After the cluster was restored from the backup, we
    noticed that atomic updates of documents are failing occasionally with
    the error message 'missing required field [...]'. The exceptions are
    thrown on a host on which the document to be updated is not stored. From
    this we are deducing that there is a problem with finding the right host
    by the hash of the uniqueKey. Indeed, our investigations so far showed
    that for at least one collection in the new cluster, the shards have
    different hash ranges assigned now. We checked the hash ranges by
    querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
    hash ranges of one collection that we debugged.

    Old cluster:
    shard1_0 80000000 - aaa9ffff
    shard1_1 aaaa0000 - d554ffff
    shard2_0 d5550000 - fffeffff
    shard2_1 ffff0000 - 2aa9ffff
    shard3_0 2aaa0000 - 5554ffff
    shard3_1 55550000 - 7fffffff

    New cluster:
    shard1 80000000 - aaa9ffff
    shard2 aaaa0000 - d554ffff
    shard3 d5550000 - ffffffff
    shard4 0 - 2aa9ffff
    shard5 2aaa0000 - 5554ffff
    shard6 55550000 - 7fffffff

    Note that the shard names differ because the old cluster's shards were
    split.

    As you can see, the ranges of shard3 and shard4 differ from the old
    cluster. This change of hash ranges matches with the symptoms we are
    currently experiencing.

    We found this JIRA ticket
    https://issues.apache.org/jira/browse/SOLR-5750
    in which David Smiley comments:

    shard hash ranges aren't restored; this error could be disasterous

    It seems that this is what happened to us. We would like to hear some
    suggestions on how we could recover from this problem.

    Best,
    Gary

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-user @
categorieslucene
postedJun 15, '16 at 3:59p
activeJun 16, '16 at 11:46p
posts4
users2
websitelucene.apache.org...

2 users in discussion

Erick Erickson: 2 posts Gary Yao: 2 posts

People

Translate

site design / logo © 2019 Grokbase