Grokbase Groups HBase user July 2011
FAQ
Hi everyone. I'd like to run the following *data* *loss* scenario by you to
see if
we are doing something obviously wrong with our setup here.

Setup:
-cdh3u0
- Hadoop 0.20.2
- HBase 0.90.1
- 1 Master Node running as NameNode & JobTracker
-zookeeper quorum
- 2 child nodes running as Datanode, TaskTracker and RegionServer each
- dfs.replication is set to 1

First, I inserted some data into the hbase a few hours ago.
Then after a while. I rebooted one of the region servers and waited until
the master responded to that. However, after I checked the table using hbase
shell (I used the "count" command), I noticed that there was a huge amount
of data being lost.
After I restarted the regionserver which I had rebooted and checked again,
I found that some of the missing data was got back but there still existed
some data which hadn't been found yet.
At last,after I disabled the table and then enabled the table , I found that
all data was stored in the cluster and there was no data that was lost.

This is problematic since we are supposed to
replicate at x1, so at least one other node should be able to
theoretically serve the *data* that the downed regionserver can't.

Questions:

- How can you guys explain this weird situation?
- Are there way to recover such lost *data*?

Any tips here are definitely appreciated. I'll be happy to provide more
information as well.-0

Search Discussions

  • Chris Tarnas at Jul 27, 2011 at 4:17 pm
    Replication of 1x means no replication. 2x would mean the data exists in two locations (what it looks like you want). Running with a replication of 1x is a very bad idea and is pretty much a guaranteed way to get data loss.

    -chris
    On Jul 27, 2011, at 8:58 AM, 吴限 wrote:

    Hi everyone. I'd like to run the following *data* *loss* scenario by you to
    see if
    we are doing something obviously wrong with our setup here.

    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    - HBase 0.90.1
    - 1 Master Node running as NameNode & JobTracker
    -zookeeper quorum
    - 2 child nodes running as Datanode, TaskTracker and RegionServer each
    - dfs.replication is set to 1

    First, I inserted some data into the hbase a few hours ago.
    Then after a while. I rebooted one of the region servers and waited until
    the master responded to that. However, after I checked the table using hbase
    shell (I used the "count" command), I noticed that there was a huge amount
    of data being lost.
    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    At last,after I disabled the table and then enabled the table , I found that
    all data was stored in the cluster and there was no data that was lost.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.

    Questions:

    - How can you guys explain this weird situation?
    - Are there way to recover such lost *data*?

    Any tips here are definitely appreciated. I'll be happy to provide more
    information as well.-0
  • 吴限 at Jul 27, 2011 at 4:23 pm
    Thx for your reply. But actually later I did another experiment similar to
    one which I explained earlier.
    Step 1: I inserted some data into the hbase.
    Step 2: I shut one of the region servers.
    Step 3 : I checked the table and found some data had been lost.
    Step 4: I disabled the table and then enabled the table
    Step 5 : I checked again and found nothing lost.

    If some data didn't exist in the other region server, then how can u explain
    this?

    Hope to get ur reply.Thx~

    2011/7/28 Chris Tarnas <cft@email.com>
    Replication of 1x means no replication. 2x would mean the data exists in
    two locations (what it looks like you want). Running with a replication of
    1x is a very bad idea and is pretty much a guaranteed way to get data loss.

    -chris
    On Jul 27, 2011, at 8:58 AM, 吴限 wrote:

    Hi everyone. I'd like to run the following *data* *loss* scenario by you to
    see if
    we are doing something obviously wrong with our setup here.

    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    - HBase 0.90.1
    - 1 Master Node running as NameNode & JobTracker
    -zookeeper quorum
    - 2 child nodes running as Datanode, TaskTracker and RegionServer each
    - dfs.replication is set to 1

    First, I inserted some data into the hbase a few hours ago.
    Then after a while. I rebooted one of the region servers and waited until
    the master responded to that. However, after I checked the table using hbase
    shell (I used the "count" command), I noticed that there was a huge amount
    of data being lost.
    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    At last,after I disabled the table and then enabled the table , I found that
    all data was stored in the cluster and there was no data that was lost.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.

    Questions:

    - How can you guys explain this weird situation?
    - Are there way to recover such lost *data*?

    Any tips here are definitely appreciated. I'll be happy to provide more
    information as well.-0
  • Stack at Jul 27, 2011 at 4:29 pm
    This I can not explain. Check blocks directory on the two servers.
    Maybe they were all under one datanode only.
    St.Ack


    2011/7/27 吴限 <infinity0222@gmail.com>:
    Thx for your reply. But actually later I did another experiment similar to
    one which I explained earlier.
    Step 1: I inserted some data into the hbase.
    Step 2: I shut one of the region servers.
    Step 3 : I checked the table and found some data had been lost.
    Step 4: I disabled the table and then enabled the table
    Step 5 : I checked again and found nothing lost.

    If some data didn't exist in the other region server, then how can u explain
    this?

    Hope to get ur reply.Thx~

    2011/7/28 Chris Tarnas <cft@email.com>
    Replication of 1x means no replication. 2x would mean the data exists in
    two locations (what it looks like you want). Running with a replication of
    1x is a very bad idea and is pretty much a guaranteed way to get data loss.

    -chris
    On Jul 27, 2011, at 8:58 AM, 吴限 wrote:

    Hi everyone. I'd like to run the following *data* *loss* scenario by you to
    see if
    we are doing something obviously wrong with our setup here.

    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    - HBase 0.90.1
    - 1 Master Node running as NameNode & JobTracker
    -zookeeper quorum
    - 2 child nodes running as Datanode, TaskTracker and RegionServer each
    - dfs.replication is set to 1

    First, I inserted some data into the hbase a few hours ago.
    Then after a while. I rebooted one of the region servers and waited until
    the master responded to that. However, after I checked the table using hbase
    shell (I used the "count" command), I noticed that there was a huge amount
    of data being lost.
    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    At last,after I disabled the table and then enabled the table , I found that
    all data was stored in the cluster and there was no data that was lost.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.

    Questions:

    - How can you guys explain this weird situation?
    - Are there way to recover such lost *data*?

    Any tips here are definitely appreciated. I'll be happy to provide more
    information as well.-0
  • Chris Tarnas at Jul 27, 2011 at 4:41 pm
    That is strange behavior. How long did you wait between Step 2 and 3, and what is the results of running

    hbase hbck

    at step 3?

    -chris
    On Jul 27, 2011, at 9:23 AM, 吴限 wrote:

    Thx for your reply. But actually later I did another experiment similar to
    one which I explained earlier.
    Step 1: I inserted some data into the hbase.
    Step 2: I shut one of the region servers.
    Step 3 : I checked the table and found some data had been lost.
    Step 4: I disabled the table and then enabled the table
    Step 5 : I checked again and found nothing lost.

    If some data didn't exist in the other region server, then how can u explain
    this?

    Hope to get ur reply.Thx~

    2011/7/28 Chris Tarnas <cft@email.com>
    Replication of 1x means no replication. 2x would mean the data exists in
    two locations (what it looks like you want). Running with a replication of
    1x is a very bad idea and is pretty much a guaranteed way to get data loss.

    -chris
    On Jul 27, 2011, at 8:58 AM, 吴限 wrote:

    Hi everyone. I'd like to run the following *data* *loss* scenario by you to
    see if
    we are doing something obviously wrong with our setup here.

    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    - HBase 0.90.1
    - 1 Master Node running as NameNode & JobTracker
    -zookeeper quorum
    - 2 child nodes running as Datanode, TaskTracker and RegionServer each
    - dfs.replication is set to 1

    First, I inserted some data into the hbase a few hours ago.
    Then after a while. I rebooted one of the region servers and waited until
    the master responded to that. However, after I checked the table using hbase
    shell (I used the "count" command), I noticed that there was a huge amount
    of data being lost.
    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    At last,after I disabled the table and then enabled the table , I found that
    all data was stored in the cluster and there was no data that was lost.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.

    Questions:

    - How can you guys explain this weird situation?
    - Are there way to recover such lost *data*?

    Any tips here are definitely appreciated. I'll be happy to provide more
    information as well.-0
  • 吴限 at Jul 27, 2011 at 4:47 pm
    Just by keep cheking http://master:60010.
    Before Step 2 :
    AddressStart CodeLoadserver4.yun.com:600301311785159202requests=0,
    regions=10, usedHeap=32,
    maxHeap=995server5.yun.com:600301311768553647requests=18,
    regions=7, usedHeap=117, maxHeap=995Total:servers: 2 requests=18,
    regions=17Then
    at Step 2, I shut server4 and wait until the html shows like this:
    AddressStart CodeLoad

    server5.yun.com:600301311768553647requests=18, regions=17, usedHeap=117,
    maxHeap=995Total:servers: 2 requests=18, regions=17then I continued the
    following steps..

    在 2011年7月28日 上午12:40,Chris Tarnas <cft@email.com>写道:
    That is strange behavior. How long did you wait between Step 2 and 3, and
    what is the results of running

    hbase hbck

    at step 3?

    -chris
    On Jul 27, 2011, at 9:23 AM, 吴限 wrote:

    Thx for your reply. But actually later I did another experiment similar to
    one which I explained earlier.
    Step 1: I inserted some data into the hbase.
    Step 2: I shut one of the region servers.
    Step 3 : I checked the table and found some data had been lost.
    Step 4: I disabled the table and then enabled the table
    Step 5 : I checked again and found nothing lost.

    If some data didn't exist in the other region server, then how can u explain
    this?

    Hope to get ur reply.Thx~

    2011/7/28 Chris Tarnas <cft@email.com>
    Replication of 1x means no replication. 2x would mean the data exists in
    two locations (what it looks like you want). Running with a replication
    of
    1x is a very bad idea and is pretty much a guaranteed way to get data
    loss.
    -chris
    On Jul 27, 2011, at 8:58 AM, 吴限 wrote:

    Hi everyone. I'd like to run the following *data* *loss* scenario by
    you
    to
    see if
    we are doing something obviously wrong with our setup here.

    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    - HBase 0.90.1
    - 1 Master Node running as NameNode & JobTracker
    -zookeeper quorum
    - 2 child nodes running as Datanode, TaskTracker and RegionServer each
    - dfs.replication is set to 1

    First, I inserted some data into the hbase a few hours ago.
    Then after a while. I rebooted one of the region servers and waited
    until
    the master responded to that. However, after I checked the table using hbase
    shell (I used the "count" command), I noticed that there was a huge amount
    of data being lost.
    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    At last,after I disabled the table and then enabled the table , I found that
    all data was stored in the cluster and there was no data that was lost.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.

    Questions:

    - How can you guys explain this weird situation?
    - Are there way to recover such lost *data*?

    Any tips here are definitely appreciated. I'll be happy to provide more
    information as well.-0
  • Suraj Varma at Jul 27, 2011 at 5:29 pm
    When you shutdown the region server, check the master logs to see if
    master has detected this condition.
    I've seen weird things happen if dns is not setup correctly - so,
    check if master (logs & ui) is correctly detecting that the region
    server is down after step 2.

    --Suraj


    2011/7/27 吴限 <infinity0222@gmail.com>:
    Just by keep cheking http://master:60010.
    Before Step 2 :
    AddressStart CodeLoadserver4.yun.com:600301311785159202requests=0,
    regions=10, usedHeap=32,
    maxHeap=995server5.yun.com:600301311768553647requests=18,
    regions=7, usedHeap=117, maxHeap=995Total:servers: 2 requests=18,
    regions=17Then
    at Step 2, I shut server4 and wait until the html shows like this:
    AddressStart CodeLoad

    server5.yun.com:600301311768553647requests=18, regions=17, usedHeap=117,
    maxHeap=995Total:servers: 2 requests=18, regions=17then I continued the
    following steps..

    在 2011年7月28日 上午12:40,Chris Tarnas <cft@email.com>写道:
    That is strange behavior. How long did you wait between Step 2 and 3, and
    what is the results of running

    hbase hbck

    at step 3?

    -chris
    On Jul 27, 2011, at 9:23 AM, 吴限 wrote:

    Thx for your reply. But actually later I did another experiment similar to
    one which I explained earlier.
    Step 1: I inserted some data into the hbase.
    Step 2: I shut one of the region servers.
    Step 3 : I checked the table and found some data had been lost.
    Step 4: I disabled the table and then enabled the table
    Step 5 : I checked again and found nothing lost.

    If some data didn't exist in the other region server, then how can u explain
    this?

    Hope to get ur reply.Thx~

    2011/7/28 Chris Tarnas <cft@email.com>
    Replication of 1x means no replication. 2x would mean the data exists in
    two locations (what it looks like you want). Running with a replication
    of
    1x is a very bad idea and is pretty much a guaranteed way to get data
    loss.
    -chris
    On Jul 27, 2011, at 8:58 AM, 吴限 wrote:

    Hi everyone. I'd like to run the following *data* *loss* scenario by
    you
    to
    see if
    we are doing something obviously wrong with our setup here.

    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    - HBase 0.90.1
    - 1 Master Node running as NameNode & JobTracker
    -zookeeper quorum
    - 2 child nodes running as Datanode, TaskTracker and RegionServer each
    - dfs.replication is set to 1

    First, I inserted some data into the hbase a few hours ago.
    Then after a while. I rebooted one of the region servers and waited
    until
    the master responded to that. However, after I checked the table using hbase
    shell (I used the "count" command), I noticed that there was a huge amount
    of data being lost.
    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    At last,after I disabled the table and then enabled the table , I found that
    all data was stored in the cluster and there was no data that was lost.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.

    Questions:

    - How can you guys explain this weird situation?
    - Are there way to recover such lost *data*?

    Any tips here are definitely appreciated. I'll be happy to provide more
    information as well.-0
  • Buttler, David at Jul 27, 2011 at 4:17 pm
    When replication is set to 1, that means there is only one copy of the data. If you take a node offline, any data on that node will be unavailable. In your scenario, try upping to a replication factor of 2

    Dave

    -----Original Message-----
    From: 吴限
    Sent: Wednesday, July 27, 2011 8:58 AM
    To: user@hbase.apache.org
    Subject: data loss due to regionserver going down

    Hi everyone. I'd like to run the following *data* *loss* scenario by you to
    see if
    we are doing something obviously wrong with our setup here.

    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    - HBase 0.90.1
    - 1 Master Node running as NameNode & JobTracker
    -zookeeper quorum
    - 2 child nodes running as Datanode, TaskTracker and RegionServer each
    - dfs.replication is set to 1

    First, I inserted some data into the hbase a few hours ago.
    Then after a while. I rebooted one of the region servers and waited until
    the master responded to that. However, after I checked the table using hbase
    shell (I used the "count" command), I noticed that there was a huge amount
    of data being lost.
    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    At last,after I disabled the table and then enabled the table , I found that
    all data was stored in the cluster and there was no data that was lost.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.

    Questions:

    - How can you guys explain this weird situation?
    - Are there way to recover such lost *data*?

    Any tips here are definitely appreciated. I'll be happy to provide more
    information as well.-0
  • Stack at Jul 27, 2011 at 4:18 pm

    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.

    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
  • 吴限 at Jul 27, 2011 at 4:24 pm
    yep~Is there anything wrong with that?


    2011/7/28 Stack <stack@duboce.net>
    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.

    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
  • 吴限 at Jul 27, 2011 at 4:33 pm
    Dear Stack, thx for your reply~
    First I don't know if there is something wrong with the cdh3u0.
    And thx for ur reminding me about the replication property,which I didn't
    quite understand but now understands. I'll try to correct this mistake.
    But actually these situations which I have described really happens with
    the replication being set to 1. And that's why I find these quite weird .
    I just started trying hbase a month ago and there exist a lot of things i
    don't quite understand.
    Hope to get replied. Thanks~
    2011/7/28 Stack <stack@duboce.net>
    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.

    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
  • 吴限 at Jul 27, 2011 at 4:36 pm
    Here is my hbase-site.xml:
    configuration>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://server3.yun.com:54310/hbase</value>
    <description>The directory shared by region servers.
    </description>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>server3.yun.com</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>


    2011/7/28 Stack <stack@duboce.net>
    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.

    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
  • Jeff Whiting at Jul 27, 2011 at 11:33 pm
    Replication needs to be higher than 1. If you have a node which is running both DataNode and
    HRegionServer then shut it down you WILL loose all the data that the DataNode was holding because no
    one else on the cluster has it. HBase relies on HDFS for the replication of data and does NOT have
    it's own data replication mechanism unlike Cassandra or Voldemort. If you set the HDFS replication
    factor to 3 then when you shutdown your node 2 other nodes will have the data and HBase will be able
    to serve that data for you.

    You can think of each DataNode as a hard drive. Having a replication factor of 1 means the data is
    only on one hard drive and if you unplug the hard drive that data will be lost. Having a replication
    factor greater than 1 is like having multiple hard drives in a raid 1 (mirrored) array. If you
    unplug one of the hard drives the data is still on the other ones and nothing is lost.

    ~Jeff
    On 7/27/2011 10:35 AM, 吴限 wrote:
    Here is my hbase-site.xml:
    configuration>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://server3.yun.com:54310/hbase</value>
    <description>The directory shared by region servers.
    </description>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>server3.yun.com</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>


    2011/7/28 Stack <stack@duboce.net>
    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Nico Guba at Jul 28, 2011 at 5:51 am
    Very interesting. What is a good value where there is not too much of a trade-off in performance?

    I'd imagine that setting this too high could create a very 'chatty' cluster.
    On 28 Jul 2011, at 00:33, Jeff Whiting wrote:

    Replication needs to be higher than 1. If you have a node which is running both DataNode and
    HRegionServer then shut it down you WILL loose all the data that the DataNode was holding because no
    one else on the cluster has it. HBase relies on HDFS for the replication of data and does NOT have
    it's own data replication mechanism unlike Cassandra or Voldemort. If you set the HDFS replication
    factor to 3 then when you shutdown your node 2 other nodes will have the data and HBase will be able
    to serve that data for you.

    You can think of each DataNode as a hard drive. Having a replication factor of 1 means the data is
    only on one hard drive and if you unplug the hard drive that data will be lost. Having a replication
    factor greater than 1 is like having multiple hard drives in a raid 1 (mirrored) array. If you
    unplug one of the hard drives the data is still on the other ones and nothing is lost.

    ~Jeff
    On 7/27/2011 10:35 AM, 吴限 wrote:
    Here is my hbase-site.xml:
    configuration>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://server3.yun.com:54310/hbase</value>
    <description>The directory shared by region servers.
    </description>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>server3.yun.com</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>


    2011/7/28 Stack <stack@duboce.net>
    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Xian Woo at Jul 28, 2011 at 12:52 pm
    Thanks, everybody. I really appreciate what you guys have done with my
    question. Indeed , for me the situation which I came across is too
    complicated and too strange to me .So I've decided to re-install the hbase
    tool and change the related configuration files.Hope this time it will get
    better. Thanks again!
    Best wishes~
    Woo.

    在 2011年7月28日 下午1:50,Nico Guba <nguba@mac.com>写道:
    Very interesting. What is a good value where there is not too much of a
    trade-off in performance?

    I'd imagine that setting this too high could create a very 'chatty'
    cluster.
    On 28 Jul 2011, at 00:33, Jeff Whiting wrote:

    Replication needs to be higher than 1. If you have a node which is
    running both DataNode and
    HRegionServer then shut it down you WILL loose all the data that the
    DataNode was holding because no
    one else on the cluster has it. HBase relies on HDFS for the replication
    of data and does NOT have
    it's own data replication mechanism unlike Cassandra or Voldemort. If you
    set the HDFS replication
    factor to 3 then when you shutdown your node 2 other nodes will have the
    data and HBase will be able
    to serve that data for you.

    You can think of each DataNode as a hard drive. Having a replication
    factor of 1 means the data is
    only on one hard drive and if you unplug the hard drive that data will be
    lost. Having a replication
    factor greater than 1 is like having multiple hard drives in a raid 1
    (mirrored) array. If you
    unplug one of the hard drives the data is still on the other ones and
    nothing is lost.
    ~Jeff
    On 7/27/2011 10:35 AM, 吴限 wrote:
    Here is my hbase-site.xml:
    configuration>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://server3.yun.com:54310/hbase</value>
    <description>The directory shared by region servers.
    </description>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>server3.yun.com</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>


    2011/7/28 Stack <stack@duboce.net>
    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com
  • Stack at Jul 28, 2011 at 4:03 pm
    Running with 1 replica is unusual -- and there is little motiviation
    for running with this configuration since it means dataloss -- so few
    have experience with it.
    St.Ack

    2011/7/28 Xian Woo <infinity0222@gmail.com>:
    Thanks, everybody. I really appreciate what you guys have done with my
    question. Indeed , for me the situation which I came across is too
    complicated and too strange to me .So I've decided to re-install the hbase
    tool and change the related configuration files.Hope this time it will get
    better. Thanks again!
    Best wishes~
    Woo.

    在 2011年7月28日 下午1:50,Nico Guba <nguba@mac.com>写道:
    Very interesting. What is a good value where there is not too much of a
    trade-off in performance?

    I'd imagine that setting this too high could create a very 'chatty'
    cluster.
    On 28 Jul 2011, at 00:33, Jeff Whiting wrote:

    Replication needs to be higher than 1. If you have a node which is
    running both DataNode and
    HRegionServer then shut it down you WILL loose all the data that the
    DataNode was holding because no
    one else on the cluster has it. HBase relies on HDFS for the replication
    of data and does NOT have
    it's own data replication mechanism unlike Cassandra or Voldemort. If you
    set the HDFS replication
    factor to 3 then when you shutdown your node 2 other nodes will have the
    data and HBase will be able
    to serve that data for you.

    You can think of each DataNode as a hard drive. Having a replication
    factor of 1 means the data is
    only on one hard drive and if you unplug the hard drive that data will be
    lost. Having a replication
    factor greater than 1 is like having multiple hard drives in a raid 1
    (mirrored) array. If you
    unplug one of the hard drives the data is still on the other ones and
    nothing is lost.
    ~Jeff
    On 7/27/2011 10:35 AM, 吴限 wrote:
    Here is my hbase-site.xml:
    configuration>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://server3.yun.com:54310/hbase</value>
    <description>The directory shared by region servers.
    </description>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>server3.yun.com</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>


    2011/7/28 Stack <stack@duboce.net>
    On Wed, Jul 27, 2011 at 8:58 AM, 吴限 wrote:
    Setup:
    -cdh3u0
    - Hadoop 0.20.2
    You are using the hadoop from cdh3u0?

    - dfs.replication is set to 1
    You will lose data if a machine goes away. You have two machines but
    only one instance of each data block; think of it as half of your data
    one one node and the rest on another. If you kill one machine, half
    your data is gone.

    After I restarted the regionserver which I had rebooted and checked again,
    I found that some of the missing data was got back but there still existed
    some data which hadn't been found yet.
    I wonder what was going on here that we didn't see it all restored.

    This is problematic since we are supposed to
    replicate at x1, so at least one other node should be able to
    theoretically serve the *data* that the downed regionserver can't.
    No. The behavior you describe would come with replication of 2, not 1.

    St.Ack
    --
    Jeff Whiting
    Qualtrics Senior Software Engineer
    jeffw@qualtrics.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJul 27, '11 at 3:58p
activeJul 28, '11 at 4:03p
posts16
users7
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase