FAQ
Hi,
Our current cluster runs with 22 data nodes - each with 4TB .
We should be installing new data nodes on this existing cluster , but each will have 8TB of storage capacity.
I am wondering how will the namenode distribute the blocks, It is my understanding that Replica Placement policy is that data nodes are chosen at random, so an even distribution
is expected , So eventually the smaller nodes
will fill up while the larger nodes will reach 50% at which point the small
nodes will become unusable.
Am I correct?
Is there any recommended practice in this case? would running a balancer periodically help?

Search Discussions

  • Ayon Sinha at Jan 20, 2011 at 5:27 pm
    We did the same exercise a few months back. When we run the balancer which takes
    a while to balance, it will balance based on the percentage of disk usage on
    each node, so you will end up with usage of nodes between say 45-55% on all
    nodes.
    Sometimes the balancer does not balance well initially, in which case, we
    increased the rep factor to 4 and kept it that way for a few day while running
    the balancer. Then we brought down the rep factor back to 3 and let the balancer
    run.
    -Ayon




    ________________________________
    From: David Ginzburg <ginzman@hotmail.com>
    To: HDFS USER mail list <hdfs-user@hadoop.apache.org>
    Sent: Thu, January 20, 2011 12:42:17 AM
    Subject: Adding new data nodes to existing cluster, with different storage
    capcity

    Hi,
    Our current cluster runs with 22 data nodes - each with 4TB .
    We should be installing new data nodes on this existing cluster , but each will
    have 8TB of storage capacity.
    I am wondering how will the namenode distribute the blocks, It is my
    understanding thatReplica Placement policy is that data nodes are chosen at
    random, so an even distribution is expected , So eventually the smaller nodes
    will fill up while the larger nodes will reach 50% at which point the small
    nodes will become unusable.

    Am I correct?
    Is there any recommended practice in this case? would running a balancer
    periodically help?
  • David Ginzburg at Jan 23, 2011 at 1:09 pm
    Thank you

    Date: Thu, 20 Jan 2011 09:26:36 -0800
    From: ayonsinha@yahoo.com
    Subject: Re: Adding new data nodes to existing cluster, with different storage capcity
    To: hdfs-user@hadoop.apache.org



    We did the same exercise a few months back. When we run the balancer which takes a while to balance, it will balance based on the percentage of disk usage on each node, so you will end up with usage of nodes between say 45-55% on all nodes.Sometimes the balancer does not balance well initially, in which case, we increased the rep factor to 4 and kept it that way for a few day while running the balancer. Then we brought down the rep factor back to 3 and let the balancer run.
    -Ayon

    From: David Ginzburg <ginzman@hotmail.com>
    To: HDFS USER mail list <hdfs-user@hadoop.apache.org>
    Sent: Thu, January 20, 2011 12:42:17 AM
    Subject: Adding new data nodes to existing cluster, with different storage capcity







    Hi,
    Our current cluster runs with 22 data nodes - each with 4TB .
    We should be installing new data nodes on this existing cluster , but each will have 8TB of storage capacity.
    I am wondering how will the namenode distribute the blocks, It is my understanding that Replica Placement policy is that data nodes are chosen at random, so an even distribution
    is expected , So eventually the smaller nodes
    will fill up while the larger nodes will reach 50% at which point the small
    nodes will become unusable.
    Am I correct?
    Is there any recommended practice in this case? would running a balancer periodically help?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedJan 20, '11 at 8:42a
activeJan 23, '11 at 1:09p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

David Ginzburg: 2 posts Ayon Sinha: 1 post

People

Translate

site design / logo © 2022 Grokbase