FAQ
Hi,

On our 15node cluster (1GB ethernet and 4x1TB disk per node) I noticed
that distcp does a much better job at rebalancing than the dedicated
balancer does. We needed to decommision 11 nodes, so that prior to
rebalancing we had 4 used and 11 empty nodes. The 4 used nodes had about
25% usage each. Most of our files are of average size: We have about
500K files in 280K blocks and 800K blocks total (blocksize is 64MB).

So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the
cluster. Started the balancer tool and I noticed that the it moved about
200GB in 1 hour. (I grepped the balancer log for "Need to move").

After stopping the balancer I started a distcp. This tool copied 900GB
in just 45 minutes, with an average replication of 2 so it's total
throughput was around 2.4 TB/hour. Fair enough, it is not purely
rebalancing because the 4 overused nodes also get new blocks, still it
performs much better. Munin confirms the much higher disk/ethernet
throughputs of the distcp.

Are these characteristics to be expected? Either way, can the balancer
be boosted even more? (Aside the dfs.balance.bandwidthPerSec property).

Ferdy.

Search Discussions

  • Mathias Herberts at May 5, 2011 at 12:57 pm
    Did you explicitely start a balancer or did you decommission the nodes
    using dfs.hosts.exclude and a dfsadmin -refreshNodes?
    On Thu, May 5, 2011 at 14:30, Ferdy Galema wrote:
    Hi,

    On our 15node cluster (1GB ethernet and 4x1TB disk per node) I noticed that
    distcp does a much better job at rebalancing than the dedicated balancer
    does. We needed to decommision 11 nodes, so that prior to rebalancing we had
    4 used and 11 empty nodes. The 4 used nodes had about 25% usage each. Most
    of our files are of average size: We have about 500K files in 280K blocks
    and 800K blocks total (blocksize is 64MB).

    So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the
    cluster. Started the balancer tool and I noticed that the it moved about
    200GB in 1 hour. (I grepped the balancer log for "Need to move").

    After stopping the balancer I started a distcp.  This tool copied 900GB in
    just 45 minutes, with an average replication of 2 so it's total throughput
    was around 2.4 TB/hour. Fair enough, it is not purely rebalancing because
    the 4 overused nodes also get new blocks, still it performs much better.
    Munin confirms the much higher disk/ethernet throughputs of the distcp.

    Are these characteristics to be expected? Either way, can the balancer be
    boosted even more? (Aside the dfs.balance.bandwidthPerSec property).

    Ferdy.
  • Ferdy Galema at May 5, 2011 at 1:43 pm
    The decommissioning was performed with solely refreshNodes, but that's
    somewhat irrelevant because the balancing tests were performed after I
    re-added the 11 empty nodes. (FYI the drives were formatted with another
    unix fs). Though I did notice that the decommissioning shows about the
    same metrics as that of the balancer test afterwards, not very fast
    that is.
    On 05/05/2011 02:57 PM, Mathias Herberts wrote:
    Did you explicitely start a balancer or did you decommission the nodes
    using dfs.hosts.exclude and a dfsadmin -refreshNodes?

    On Thu, May 5, 2011 at 14:30, Ferdy Galemawrote:
    Hi,

    On our 15node cluster (1GB ethernet and 4x1TB disk per node) I noticed that
    distcp does a much better job at rebalancing than the dedicated balancer
    does. We needed to decommision 11 nodes, so that prior to rebalancing we had
    4 used and 11 empty nodes. The 4 used nodes had about 25% usage each. Most
    of our files are of average size: We have about 500K files in 280K blocks
    and 800K blocks total (blocksize is 64MB).

    So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the
    cluster. Started the balancer tool and I noticed that the it moved about
    200GB in 1 hour. (I grepped the balancer log for "Need to move").

    After stopping the balancer I started a distcp. This tool copied 900GB in
    just 45 minutes, with an average replication of 2 so it's total throughput
    was around 2.4 TB/hour. Fair enough, it is not purely rebalancing because
    the 4 overused nodes also get new blocks, still it performs much better.
    Munin confirms the much higher disk/ethernet throughputs of the distcp.

    Are these characteristics to be expected? Either way, can the balancer be
    boosted even more? (Aside the dfs.balance.bandwidthPerSec property).

    Ferdy.
  • Ferdy Galema at May 5, 2011 at 3:27 pm
    I figured out what caused the slow balancing. Starting the balancer with
    a too small threshold will decrease the speed dramatically:

    ./start-balancer.sh -threshold 0.01
    2011-05-05 17:17:04,132 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes
    in this iteration
    2011-05-05 17:17:36,684 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes
    in this iteration
    2011-05-05 17:18:09,737 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes
    in this iteration
    2011-05-05 17:18:41,977 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes
    in this iteration

    as opposed to:

    ./start-balancer.sh
    2011-05-05 17:19:01,676 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 40 GBbytes in
    this iteration
    2011-05-05 17:21:36,800 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 30 GBbytes in
    this iteration
    2011-05-05 17:24:13,191 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 30 GBbytes in
    this iteration

    I'd expect setting the granularity would not affect speed, just the
    stopping threshold. Perhaps a bug?
    On 05/05/2011 03:43 PM, Ferdy Galema wrote:
    The decommissioning was performed with solely refreshNodes, but that's
    somewhat irrelevant because the balancing tests were performed after I
    re-added the 11 empty nodes. (FYI the drives were formatted with
    another unix fs). Though I did notice that the decommissioning shows
    about the same metrics as that of the balancer test afterwards, not
    very fast that is.
    On 05/05/2011 02:57 PM, Mathias Herberts wrote:
    Did you explicitely start a balancer or did you decommission the nodes
    using dfs.hosts.exclude and a dfsadmin -refreshNodes?

    On Thu, May 5, 2011 at 14:30, Ferdy Galema<ferdy.galema@kalooga.com>
    wrote:
    Hi,

    On our 15node cluster (1GB ethernet and 4x1TB disk per node) I
    noticed that
    distcp does a much better job at rebalancing than the dedicated
    balancer
    does. We needed to decommision 11 nodes, so that prior to
    rebalancing we had
    4 used and 11 empty nodes. The 4 used nodes had about 25% usage
    each. Most
    of our files are of average size: We have about 500K files in 280K
    blocks
    and 800K blocks total (blocksize is 64MB).

    So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the
    cluster. Started the balancer tool and I noticed that the it moved
    about
    200GB in 1 hour. (I grepped the balancer log for "Need to move").

    After stopping the balancer I started a distcp. This tool copied
    900GB in
    just 45 minutes, with an average replication of 2 so it's total
    throughput
    was around 2.4 TB/hour. Fair enough, it is not purely rebalancing
    because
    the 4 overused nodes also get new blocks, still it performs much
    better.
    Munin confirms the much higher disk/ethernet throughputs of the distcp.

    Are these characteristics to be expected? Either way, can the
    balancer be
    boosted even more? (Aside the dfs.balance.bandwidthPerSec property).

    Ferdy.
  • Eric Fiala at May 5, 2011 at 3:47 pm
    Ferdy - that is interesting.
    I would expect lower threshold = more data to move around (or equal to
    default 10%)

    Try with a whole integer, we regularly run balancer, -threshold 1 (to
    balance to 1%), maybe the decimal is throwing a wrench at hadoop.

    EF
    On 5 May 2011 09:27, Ferdy Galema wrote:

    I figured out what caused the slow balancing. Starting the balancer with a
    too small threshold will decrease the speed dramatically:

    ./start-balancer.sh -threshold 0.01
    2011-05-05 17:17:04,132 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration
    2011-05-05 17:17:36,684 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration
    2011-05-05 17:18:09,737 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration
    2011-05-05 17:18:41,977 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration

    as opposed to:

    ./start-balancer.sh
    2011-05-05 17:19:01,676 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 40 GBbytes in
    this iteration
    2011-05-05 17:21:36,800 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 30 GBbytes in
    this iteration
    2011-05-05 17:24:13,191 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 30 GBbytes in
    this iteration

    I'd expect setting the granularity would not affect speed, just the
    stopping threshold. Perhaps a bug?

    On 05/05/2011 03:43 PM, Ferdy Galema wrote:

    The decommissioning was performed with solely refreshNodes, but that's
    somewhat irrelevant because the balancing tests were performed after I
    re-added the 11 empty nodes. (FYI the drives were formatted with another
    unix fs). Though I did notice that the decommissioning shows about the same
    metrics as that of the balancer test afterwards, not very fast that is.
    On 05/05/2011 02:57 PM, Mathias Herberts wrote:

    Did you explicitely start a balancer or did you decommission the nodes
    using dfs.hosts.exclude and a dfsadmin -refreshNodes?

    On Thu, May 5, 2011 at 14:30, Ferdy Galema<ferdy.galema@kalooga.com>
    wrote:
    Hi,

    On our 15node cluster (1GB ethernet and 4x1TB disk per node) I noticed
    that
    distcp does a much better job at rebalancing than the dedicated balancer
    does. We needed to decommision 11 nodes, so that prior to rebalancing we
    had
    4 used and 11 empty nodes. The 4 used nodes had about 25% usage each.
    Most
    of our files are of average size: We have about 500K files in 280K
    blocks
    and 800K blocks total (blocksize is 64MB).

    So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the
    cluster. Started the balancer tool and I noticed that the it moved about
    200GB in 1 hour. (I grepped the balancer log for "Need to move").

    After stopping the balancer I started a distcp. This tool copied 900GB
    in
    just 45 minutes, with an average replication of 2 so it's total
    throughput
    was around 2.4 TB/hour. Fair enough, it is not purely rebalancing
    because
    the 4 overused nodes also get new blocks, still it performs much better.
    Munin confirms the much higher disk/ethernet throughputs of the distcp.

    Are these characteristics to be expected? Either way, can the balancer
    be
    boosted even more? (Aside the dfs.balance.bandwidthPerSec property).

    Ferdy.
  • Ferdy Galema at May 5, 2011 at 6:55 pm
    I actually tried 1% right after I ran the balancer with the default
    threshold. The data moved was the same as default. So in short this is
    what I tried:

    default: fast (lots of data moved; 30 to 40GB every iteration)
    1%: fast (same as above)
    0.01%: slow (it moves only 1.26GB in a iteration, for hours long the
    exact same amount)

    At the moment the cluster is already fully balanced.
    On 05/05/2011 05:46 PM, Eric Fiala wrote:
    Ferdy - that is interesting.
    I would expect lower threshold = more data to move around (or equal to
    default 10%)

    Try with a whole integer, we regularly run balancer, -threshold 1 (to
    balance to 1%), maybe the decimal is throwing a wrench at hadoop.

    EF

    On 5 May 2011 09:27, Ferdy Galemawrote:
    I figured out what caused the slow balancing. Starting the balancer with a
    too small threshold will decrease the speed dramatically:

    ./start-balancer.sh -threshold 0.01
    2011-05-05 17:17:04,132 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration
    2011-05-05 17:17:36,684 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration
    2011-05-05 17:18:09,737 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration
    2011-05-05 17:18:41,977 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 1.26 GBbytes in
    this iteration

    as opposed to:

    ./start-balancer.sh
    2011-05-05 17:19:01,676 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 40 GBbytes in
    this iteration
    2011-05-05 17:21:36,800 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 30 GBbytes in
    this iteration
    2011-05-05 17:24:13,191 INFO
    org.apache.hadoop.hdfs.server.balancer.Balancer: Will move 30 GBbytes in
    this iteration

    I'd expect setting the granularity would not affect speed, just the
    stopping threshold. Perhaps a bug?

    On 05/05/2011 03:43 PM, Ferdy Galema wrote:

    The decommissioning was performed with solely refreshNodes, but that's
    somewhat irrelevant because the balancing tests were performed after I
    re-added the 11 empty nodes. (FYI the drives were formatted with another
    unix fs). Though I did notice that the decommissioning shows about the same
    metrics as that of the balancer test afterwards, not very fast that is.
    On 05/05/2011 02:57 PM, Mathias Herberts wrote:

    Did you explicitely start a balancer or did you decommission the nodes
    using dfs.hosts.exclude and a dfsadmin -refreshNodes?

    On Thu, May 5, 2011 at 14:30, Ferdy Galema<ferdy.galema@kalooga.com>
    wrote:
    Hi,

    On our 15node cluster (1GB ethernet and 4x1TB disk per node) I noticed
    that
    distcp does a much better job at rebalancing than the dedicated balancer
    does. We needed to decommision 11 nodes, so that prior to rebalancing we
    had
    4 used and 11 empty nodes. The 4 used nodes had about 25% usage each.
    Most
    of our files are of average size: We have about 500K files in 280K
    blocks
    and 800K blocks total (blocksize is 64MB).

    So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the
    cluster. Started the balancer tool and I noticed that the it moved about
    200GB in 1 hour. (I grepped the balancer log for "Need to move").

    After stopping the balancer I started a distcp. This tool copied 900GB
    in
    just 45 minutes, with an average replication of 2 so it's total
    throughput
    was around 2.4 TB/hour. Fair enough, it is not purely rebalancing
    because
    the 4 overused nodes also get new blocks, still it performs much better.
    Munin confirms the much higher disk/ethernet throughputs of the distcp.

    Are these characteristics to be expected? Either way, can the balancer
    be
    boosted even more? (Aside the dfs.balance.bandwidthPerSec property).

    Ferdy.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 5, '11 at 12:31p
activeMay 5, '11 at 6:55p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase