FAQ
Hi,

I'm running a 37 DN hdfs cluster. There are 12 nodes have 20TB capacity each
node, and the other 25 nodes have 24TB each node.Unfortunately, there are
several nodes that contain much more data than others, and I can still see
the data increasing crazy. The 'dstat' shows

dstat -ta 2
-----time----- ----total-cpu-usage---- -dsk/total- -net/total- ---paging--
---system--
date/time |usr sys idl wai hiq siq| read writ| recv send| in out |
int csw
24-06 00:42:43| 1 1 95 2 0 0| 25M 62M| 0 0 | 0 0.1
3532 5644
24-06 00:42:45| 7 1 91 0 0 0| 16k 176k|8346B 1447k| 0 0
1201 365
24-06 00:42:47| 7 1 91 0 0 0| 12k 172k|9577B 1493k| 0 0
1223 334
24-06 00:42:49| 11 3 83 1 0 1| 26M 11M| 78M 66M| 0 0 |
12k 18k
24-06 00:42:51| 4 3 90 1 0 2| 17M 181M| 117M 53M| 0 0 |
15k 26k
24-06 00:42:53| 4 3 87 4 0 2| 15M 375M| 117M 55M| 0 0 |
16k 26k
24-06 00:42:55| 3 2 94 1 0 1| 15M 37M| 80M 17M| 0 0 |
10k 15k
24-06 00:42:57| 0 0 98 1 0 0| 18M 23M|7259k 5988k| 0 0
1932 1066
24-06 00:42:59| 0 0 98 1 0 0| 16M 132M| 708k 106k| 0 0
1484 491
24-06 00:43:01| 4 2 91 2 0 1| 23M 64M| 76M 41M| 0 0
8441 13k
24-06 00:43:03| 4 3 88 3 0 1| 17M 207M| 91M 48M| 0 0 |
11k 16k
From the result of dstat, we can see that the throughput of write is much
more than read.
I've started a balancer processor, with dfs.balance.bandwidthPerSec set to
bytes. From
the balancer log, I can see the balancer works well. But the balance
operation can not
catch up with the write operation.

Now I can only stop the mad increase of data size by stopping the datanode,
and setting
dfs.datanode.du.reserved 300GB, then starting the datanode again. Until the
total size
reaches the 300GB reservation line, the increase stopped.

The output of 'hadoop dfsadmin -report' shows for the crazy nodes,

Name: 10.150.161.88:50010
Decommission Status : Normal
Configured Capacity: 20027709382656 (18.22 TB)
DFS Used: 14515387866480 (13.2 TB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 5512321516176(5.01 TB)
DFS Used%: 72.48%
DFS Remaining%: 27.52%
Last contact: Wed Jun 29 21:03:01 CST 2011


Name: 10.150.161.76:50010
Decommission Status : Normal
Configured Capacity: 20027709382656 (18.22 TB)
DFS Used: 16554450730194 (15.06 TB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 3473258652462(3.16 TB)
DFS Used%: 82.66%
DFS Remaining%: 17.34%
Last contact: Wed Jun 29 21:03:02 CST 2011

while the other normal datanode, it just like

Name: 10.150.161.65:50010
Decommission Status : Normal
Configured Capacity: 23627709382656 (21.49 TB)
DFS Used: 5953984552236 (5.42 TB)
Non DFS Used: 1200643810004 (1.09 TB)
DFS Remaining: 16473081020416(14.98 TB)
DFS Used%: 25.2%
DFS Remaining%: 69.72%
Last contact: Wed Jun 29 21:03:01 CST 2011


Name: 10.150.161.80:50010
Decommission Status : Normal
Configured Capacity: 23627709382656 (21.49 TB)
DFS Used: 5982565373592 (5.44 TB)
Non DFS Used: 1202701691240 (1.09 TB)
DFS Remaining: 16442442317824(14.95 TB)
DFS Used%: 25.32%
DFS Remaining%: 69.59%
Last contact: Wed Jun 29 21:03:02 CST 2011

Any hint on this issue? We are using 0.20.2-cdh3u0.

Thanks and regards,

Mao Xu-Feng

Search Discussions

  • Edward Capriolo at Jun 29, 2011 at 3:24 pm
    We have run into this issue as well. Since hadoop is RR writing different
    size disks really screw things up royally especially if you are running at
    high capacity. We have found that decommissioning hosts for stretches of
    time is more effective then the balancer in extreme situations. Another
    hokey trick is that nodes that launch a job always use that node as the
    first replica. You can leverage that by launching jobs from your bigger
    machines which makes data more likely to be saved there. Super hokey
    solution is moving blocks around with rsync! (block reports later happen and
    deal with this (I do not suggest this)).

    Hadoop really does need a more intelligent system then Round Robin writing
    for heterogeneous systems, there might be a jira open on this somewhere. But
    if you are on 0.20.X you have to work with it.

    Edward
    On Wed, Jun 29, 2011 at 9:06 AM, 茅旭峰 wrote:

    Hi,

    I'm running a 37 DN hdfs cluster. There are 12 nodes have 20TB capacity
    each
    node, and the other 25 nodes have 24TB each node.Unfortunately, there are
    several nodes that contain much more data than others, and I can still see
    the data increasing crazy. The 'dstat' shows

    dstat -ta 2
    -----time----- ----total-cpu-usage---- -dsk/total- -net/total- ---paging--
    ---system--
    date/time |usr sys idl wai hiq siq| read writ| recv send| in out |
    int csw
    24-06 00:42:43| 1 1 95 2 0 0| 25M 62M| 0 0 | 0 0.1
    3532 5644
    24-06 00:42:45| 7 1 91 0 0 0| 16k 176k|8346B 1447k| 0 0
    1201 365
    24-06 00:42:47| 7 1 91 0 0 0| 12k 172k|9577B 1493k| 0 0
    1223 334
    24-06 00:42:49| 11 3 83 1 0 1| 26M 11M| 78M 66M| 0 0 |
    12k 18k
    24-06 00:42:51| 4 3 90 1 0 2| 17M 181M| 117M 53M| 0 0 |
    15k 26k
    24-06 00:42:53| 4 3 87 4 0 2| 15M 375M| 117M 55M| 0 0 |
    16k 26k
    24-06 00:42:55| 3 2 94 1 0 1| 15M 37M| 80M 17M| 0 0 |
    10k 15k
    24-06 00:42:57| 0 0 98 1 0 0| 18M 23M|7259k 5988k| 0 0
    1932 1066
    24-06 00:42:59| 0 0 98 1 0 0| 16M 132M| 708k 106k| 0 0
    1484 491
    24-06 00:43:01| 4 2 91 2 0 1| 23M 64M| 76M 41M| 0 0
    8441 13k
    24-06 00:43:03| 4 3 88 3 0 1| 17M 207M| 91M 48M| 0 0 |
    11k 16k

    From the result of dstat, we can see that the throughput of write is much
    more than read.
    I've started a balancer processor, with dfs.balance.bandwidthPerSec set to
    bytes. From
    the balancer log, I can see the balancer works well. But the balance
    operation can not
    catch up with the write operation.

    Now I can only stop the mad increase of data size by stopping the datanode,
    and setting
    dfs.datanode.du.reserved 300GB, then starting the datanode again. Until the
    total size
    reaches the 300GB reservation line, the increase stopped.

    The output of 'hadoop dfsadmin -report' shows for the crazy nodes,

    Name: 10.150.161.88:50010
    Decommission Status : Normal
    Configured Capacity: 20027709382656 (18.22 TB)
    DFS Used: 14515387866480 (13.2 TB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 5512321516176(5.01 TB)
    DFS Used%: 72.48%
    DFS Remaining%: 27.52%
    Last contact: Wed Jun 29 21:03:01 CST 2011


    Name: 10.150.161.76:50010
    Decommission Status : Normal
    Configured Capacity: 20027709382656 (18.22 TB)
    DFS Used: 16554450730194 (15.06 TB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 3473258652462(3.16 TB)
    DFS Used%: 82.66%
    DFS Remaining%: 17.34%
    Last contact: Wed Jun 29 21:03:02 CST 2011

    while the other normal datanode, it just like

    Name: 10.150.161.65:50010
    Decommission Status : Normal
    Configured Capacity: 23627709382656 (21.49 TB)
    DFS Used: 5953984552236 (5.42 TB)
    Non DFS Used: 1200643810004 (1.09 TB)
    DFS Remaining: 16473081020416(14.98 TB)
    DFS Used%: 25.2%
    DFS Remaining%: 69.72%
    Last contact: Wed Jun 29 21:03:01 CST 2011


    Name: 10.150.161.80:50010
    Decommission Status : Normal
    Configured Capacity: 23627709382656 (21.49 TB)
    DFS Used: 5982565373592 (5.44 TB)
    Non DFS Used: 1202701691240 (1.09 TB)
    DFS Remaining: 16442442317824(14.95 TB)
    DFS Used%: 25.32%
    DFS Remaining%: 69.59%
    Last contact: Wed Jun 29 21:03:02 CST 2011

    Any hint on this issue? We are using 0.20.2-cdh3u0.

    Thanks and regards,

    Mao Xu-Feng
  • 茅旭峰 at Jun 30, 2011 at 4:23 am
    Thanks Edward! It seems like we could only live with this issue.
    On Wed, Jun 29, 2011 at 11:24 PM, Edward Capriolo wrote:

    We have run into this issue as well. Since hadoop is RR writing different
    size disks really screw things up royally especially if you are running at
    high capacity. We have found that decommissioning hosts for stretches of
    time is more effective then the balancer in extreme situations. Another
    hokey trick is that nodes that launch a job always use that node as the
    first replica. You can leverage that by launching jobs from your bigger
    machines which makes data more likely to be saved there. Super hokey
    solution is moving blocks around with rsync! (block reports later happen
    and
    deal with this (I do not suggest this)).

    Hadoop really does need a more intelligent system then Round Robin writing
    for heterogeneous systems, there might be a jira open on this somewhere.
    But
    if you are on 0.20.X you have to work with it.

    Edward
    On Wed, Jun 29, 2011 at 9:06 AM, 茅旭峰 wrote:

    Hi,

    I'm running a 37 DN hdfs cluster. There are 12 nodes have 20TB capacity
    each
    node, and the other 25 nodes have 24TB each node.Unfortunately, there are
    several nodes that contain much more data than others, and I can still see
    the data increasing crazy. The 'dstat' shows

    dstat -ta 2
    -----time----- ----total-cpu-usage---- -dsk/total- -net/total-
    ---paging--
    ---system--
    date/time |usr sys idl wai hiq siq| read writ| recv send| in out

    int csw
    24-06 00:42:43| 1 1 95 2 0 0| 25M 62M| 0 0 | 0 0.1
    3532 5644
    24-06 00:42:45| 7 1 91 0 0 0| 16k 176k|8346B 1447k| 0 0
    1201 365
    24-06 00:42:47| 7 1 91 0 0 0| 12k 172k|9577B 1493k| 0 0
    1223 334
    24-06 00:42:49| 11 3 83 1 0 1| 26M 11M| 78M 66M| 0 0

    12k 18k
    24-06 00:42:51| 4 3 90 1 0 2| 17M 181M| 117M 53M| 0 0

    15k 26k
    24-06 00:42:53| 4 3 87 4 0 2| 15M 375M| 117M 55M| 0 0

    16k 26k
    24-06 00:42:55| 3 2 94 1 0 1| 15M 37M| 80M 17M| 0 0

    10k 15k
    24-06 00:42:57| 0 0 98 1 0 0| 18M 23M|7259k 5988k| 0 0
    1932 1066
    24-06 00:42:59| 0 0 98 1 0 0| 16M 132M| 708k 106k| 0 0
    1484 491
    24-06 00:43:01| 4 2 91 2 0 1| 23M 64M| 76M 41M| 0 0
    8441 13k
    24-06 00:43:03| 4 3 88 3 0 1| 17M 207M| 91M 48M| 0 0

    11k 16k

    From the result of dstat, we can see that the throughput of write is much
    more than read.
    I've started a balancer processor, with dfs.balance.bandwidthPerSec set to
    bytes. From
    the balancer log, I can see the balancer works well. But the balance
    operation can not
    catch up with the write operation.

    Now I can only stop the mad increase of data size by stopping the datanode,
    and setting
    dfs.datanode.du.reserved 300GB, then starting the datanode again. Until the
    total size
    reaches the 300GB reservation line, the increase stopped.

    The output of 'hadoop dfsadmin -report' shows for the crazy nodes,

    Name: 10.150.161.88:50010
    Decommission Status : Normal
    Configured Capacity: 20027709382656 (18.22 TB)
    DFS Used: 14515387866480 (13.2 TB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 5512321516176(5.01 TB)
    DFS Used%: 72.48%
    DFS Remaining%: 27.52%
    Last contact: Wed Jun 29 21:03:01 CST 2011


    Name: 10.150.161.76:50010
    Decommission Status : Normal
    Configured Capacity: 20027709382656 (18.22 TB)
    DFS Used: 16554450730194 (15.06 TB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 3473258652462(3.16 TB)
    DFS Used%: 82.66%
    DFS Remaining%: 17.34%
    Last contact: Wed Jun 29 21:03:02 CST 2011

    while the other normal datanode, it just like

    Name: 10.150.161.65:50010
    Decommission Status : Normal
    Configured Capacity: 23627709382656 (21.49 TB)
    DFS Used: 5953984552236 (5.42 TB)
    Non DFS Used: 1200643810004 (1.09 TB)
    DFS Remaining: 16473081020416(14.98 TB)
    DFS Used%: 25.2%
    DFS Remaining%: 69.72%
    Last contact: Wed Jun 29 21:03:01 CST 2011


    Name: 10.150.161.80:50010
    Decommission Status : Normal
    Configured Capacity: 23627709382656 (21.49 TB)
    DFS Used: 5982565373592 (5.44 TB)
    Non DFS Used: 1202701691240 (1.09 TB)
    DFS Remaining: 16442442317824(14.95 TB)
    DFS Used%: 25.32%
    DFS Remaining%: 69.59%
    Last contact: Wed Jun 29 21:03:02 CST 2011

    Any hint on this issue? We are using 0.20.2-cdh3u0.

    Thanks and regards,

    Mao Xu-Feng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 29, '11 at 1:07p
activeJun 30, '11 at 4:23a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

茅旭峰: 2 posts Edward Capriolo: 1 post

People

Translate

site design / logo © 2021 Grokbase