FAQ
Hello Hadoop Gurus,
We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We have
allocated around 33-34 GB per node for hadoop processes. Leaving the rest of
the 14-15 GB memory for OS and as buffer. There are no other processes
running on these nodes.
Most of the lighter jobs run successfully but one big job is de-stabilizing
the cluster. One node starts swapping and runs out of swap space and goes
offline. We tracked the processes on that node and noticed that it ends up
with more than expected hadoop-java processes.
The other 3 nodes were running 10 or 11 processes and this node ends up with
36. After killing the job we find these processes still show up and we have
to kill them manually.
We have tried reducing the swappiness to 6 but saw the same results. It also
looks like hadoop stays well within the memory limits allocated and still
starts swapping.

Some other suggestions we have seen are:
1) Increase swap size. Current size is 6 GB. The most quoted size is 'tons
of swap' but note sure how much it translates to in numbers. Should it be 16
or 24 GB
2) Increase overcommit ratio. Not sure if this helps as a few blog comments
mentioned it didn't help

Any other hadoop or linux config suggestions are welcome.

Thanks.

-Adi

Search Discussions

  • Michel Segel at May 11, 2011 at 5:44 pm
    You have to do the math...
    If you have 2gb per mapper, and run 10 mappers per node... That means 20gb of memory.
    Then you have TT and DN running which also take memory...

    What did you set as the number of mappers/reducers per node?

    What do you see in ganglia or when you run top?

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On May 11, 2011, at 12:31 PM, Adi wrote:

    Hello Hadoop Gurus,
    We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We have
    allocated around 33-34 GB per node for hadoop processes. Leaving the rest of
    the 14-15 GB memory for OS and as buffer. There are no other processes
    running on these nodes.
    Most of the lighter jobs run successfully but one big job is de-stabilizing
    the cluster. One node starts swapping and runs out of swap space and goes
    offline. We tracked the processes on that node and noticed that it ends up
    with more than expected hadoop-java processes.
    The other 3 nodes were running 10 or 11 processes and this node ends up with
    36. After killing the job we find these processes still show up and we have
    to kill them manually.
    We have tried reducing the swappiness to 6 but saw the same results. It also
    looks like hadoop stays well within the memory limits allocated and still
    starts swapping.

    Some other suggestions we have seen are:
    1) Increase swap size. Current size is 6 GB. The most quoted size is 'tons
    of swap' but note sure how much it translates to in numbers. Should it be 16
    or 24 GB
    2) Increase overcommit ratio. Not sure if this helps as a few blog comments
    mentioned it didn't help

    Any other hadoop or linux config suggestions are welcome.

    Thanks.

    -Adi
  • Adi at May 11, 2011 at 6:11 pm
    By our calculations hadoop should not exceed 70% of memory.
    Allocated per node - 48 map slots (24 GB) , 12 reduce slots (6 GB), 1 GB
    each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
    allocation.
    The queues are capped at using only 90% of capacity allocated so generally
    10% of slots are always kept free.

    The cluster was running total 33 mappers and 1 reducer so around 8-9 mappers
    per node with 3 GB max limit and they were utilizing around 2GB each.
    Top was showing 100% memory utilized. Which our sys admin says is ok as the
    memory is used for file caching by linux if the processes are not using it.
    No swapping on 3 nodes.
    Then node4 just started swapping after the number of processes shot up
    unexpectedly. The main mystery are these excess number of processes on the
    node which went down. 36 as opposed to expected 11. The other 3 nodes were
    successfully executing the mappers without any memory/swap issues.

    -Adi
    On Wed, May 11, 2011 at 1:40 PM, Michel Segel wrote:

    You have to do the math...
    If you have 2gb per mapper, and run 10 mappers per node... That means 20gb
    of memory.
    Then you have TT and DN running which also take memory...

    What did you set as the number of mappers/reducers per node?

    What do you see in ganglia or when you run top?

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On May 11, 2011, at 12:31 PM, Adi wrote:

    Hello Hadoop Gurus,
    We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We have
    allocated around 33-34 GB per node for hadoop processes. Leaving the rest of
    the 14-15 GB memory for OS and as buffer. There are no other processes
    running on these nodes.
    Most of the lighter jobs run successfully but one big job is
    de-stabilizing
    the cluster. One node starts swapping and runs out of swap space and goes
    offline. We tracked the processes on that node and noticed that it ends up
    with more than expected hadoop-java processes.
    The other 3 nodes were running 10 or 11 processes and this node ends up with
    36. After killing the job we find these processes still show up and we have
    to kill them manually.
    We have tried reducing the swappiness to 6 but saw the same results. It also
    looks like hadoop stays well within the memory limits allocated and still
    starts swapping.

    Some other suggestions we have seen are:
    1) Increase swap size. Current size is 6 GB. The most quoted size is 'tons
    of swap' but note sure how much it translates to in numbers. Should it be 16
    or 24 GB
    2) Increase overcommit ratio. Not sure if this helps as a few blog comments
    mentioned it didn't help

    Any other hadoop or linux config suggestions are welcome.

    Thanks.

    -Adi
  • Ted Dunning at May 11, 2011 at 6:17 pm
    How is it that 36 processes are not expected if you have configured 48 + 12
    = 50 slots available on the machine?
    On Wed, May 11, 2011 at 11:11 AM, Adi wrote:

    By our calculations hadoop should not exceed 70% of memory.
    Allocated per node - 48 map slots (24 GB) , 12 reduce slots (6 GB), 1 GB
    each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
    allocation.
    The queues are capped at using only 90% of capacity allocated so generally
    10% of slots are always kept free.

    The cluster was running total 33 mappers and 1 reducer so around 8-9
    mappers
    per node with 3 GB max limit and they were utilizing around 2GB each.
    Top was showing 100% memory utilized. Which our sys admin says is ok as the
    memory is used for file caching by linux if the processes are not using it.
    No swapping on 3 nodes.
    Then node4 just started swapping after the number of processes shot up
    unexpectedly. The main mystery are these excess number of processes on the
    node which went down. 36 as opposed to expected 11. The other 3 nodes were
    successfully executing the mappers without any memory/swap issues.

    -Adi

    On Wed, May 11, 2011 at 1:40 PM, Michel Segel <michael_segel@hotmail.com
    wrote:
    You have to do the math...
    If you have 2gb per mapper, and run 10 mappers per node... That means 20gb
    of memory.
    Then you have TT and DN running which also take memory...

    What did you set as the number of mappers/reducers per node?

    What do you see in ganglia or when you run top?

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On May 11, 2011, at 12:31 PM, Adi wrote:

    Hello Hadoop Gurus,
    We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We have
    allocated around 33-34 GB per node for hadoop processes. Leaving the
    rest
    of
    the 14-15 GB memory for OS and as buffer. There are no other processes
    running on these nodes.
    Most of the lighter jobs run successfully but one big job is
    de-stabilizing
    the cluster. One node starts swapping and runs out of swap space and
    goes
    offline. We tracked the processes on that node and noticed that it ends up
    with more than expected hadoop-java processes.
    The other 3 nodes were running 10 or 11 processes and this node ends up with
    36. After killing the job we find these processes still show up and we have
    to kill them manually.
    We have tried reducing the swappiness to 6 but saw the same results. It also
    looks like hadoop stays well within the memory limits allocated and
    still
    starts swapping.

    Some other suggestions we have seen are:
    1) Increase swap size. Current size is 6 GB. The most quoted size is 'tons
    of swap' but note sure how much it translates to in numbers. Should it
    be
    16
    or 24 GB
    2) Increase overcommit ratio. Not sure if this helps as a few blog comments
    mentioned it didn't help

    Any other hadoop or linux config suggestions are welcome.

    Thanks.

    -Adi
  • Adi at May 11, 2011 at 6:31 pm
    Actually per node 56 + 12 = 68 slots(not mappers/reducers).
    With the jobs configuration it was using 6 slots per mapper(resulting in 8-9
    mappers), 6 slot per reducer( 1 reducer).
    There was mistake in my earlier mails. The map slots are 56 not 48, but
    still total memory allocation for hadoop comes to around 35-36GB.

    -Adi


    On Wed, May 11, 2011 at 2:16 PM, Ted Dunning wrote:

    How is it that 36 processes are not expected if you have configured 48 + 12
    = 50 slots available on the machine?
    On Wed, May 11, 2011 at 11:11 AM, Adi wrote:

    By our calculations hadoop should not exceed 70% of memory.
    Allocated per node - 48 map slots (24 GB) , 12 reduce slots (6 GB), 1 GB
    each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
    allocation.
    The queues are capped at using only 90% of capacity allocated so generally
    10% of slots are always kept free.

    The cluster was running total 33 mappers and 1 reducer so around 8-9
    mappers
    per node with 3 GB max limit and they were utilizing around 2GB each.
    Top was showing 100% memory utilized. Which our sys admin says is ok as the
    memory is used for file caching by linux if the processes are not using it.
    No swapping on 3 nodes.
    Then node4 just started swapping after the number of processes shot up
    unexpectedly. The main mystery are these excess number of processes on the
    node which went down. 36 as opposed to expected 11. The other 3 nodes were
    successfully executing the mappers without any memory/swap issues.

    -Adi

    On Wed, May 11, 2011 at 1:40 PM, Michel Segel <michael_segel@hotmail.com
    wrote:
    You have to do the math...
    If you have 2gb per mapper, and run 10 mappers per node... That means 20gb
    of memory.
    Then you have TT and DN running which also take memory...

    What did you set as the number of mappers/reducers per node?

    What do you see in ganglia or when you run top?

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On May 11, 2011, at 12:31 PM, Adi wrote:

    Hello Hadoop Gurus,
    We are running a 4-node cluster. We just upgraded the RAM to 48 GB.
    We
    have
    allocated around 33-34 GB per node for hadoop processes. Leaving the
    rest
    of
    the 14-15 GB memory for OS and as buffer. There are no other
    processes
    running on these nodes.
    Most of the lighter jobs run successfully but one big job is
    de-stabilizing
    the cluster. One node starts swapping and runs out of swap space and
    goes
    offline. We tracked the processes on that node and noticed that it
    ends
    up
    with more than expected hadoop-java processes.
    The other 3 nodes were running 10 or 11 processes and this node ends
    up
    with
    36. After killing the job we find these processes still show up and
    we
    have
    to kill them manually.
    We have tried reducing the swappiness to 6 but saw the same results.
    It
    also
    looks like hadoop stays well within the memory limits allocated and
    still
    starts swapping.

    Some other suggestions we have seen are:
    1) Increase swap size. Current size is 6 GB. The most quoted size is 'tons
    of swap' but note sure how much it translates to in numbers. Should
    it
    be
    16
    or 24 GB
    2) Increase overcommit ratio. Not sure if this helps as a few blog comments
    mentioned it didn't help

    Any other hadoop or linux config suggestions are welcome.

    Thanks.

    -Adi
  • Allen Wittenauer at May 11, 2011 at 7:52 pm
    On May 11, 2011, at 11:11 AM, Adi wrote:

    By our calculations hadoop should not exceed 70% of memory.
    Allocated per node - 48 map slots (24 GB) , 12 reduce slots (6 GB), 1 GB
    each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
    allocation.
    It sounds like you are only taking into consideration the heap size. There is more memory allocated than just the heap...
    The queues are capped at using only 90% of capacity allocated so generally
    10% of slots are always kept free.
    But that doesn't translate into how free the nodes, which you've discovered. Individual nodes should be configured based on the assumption that *all* slots will be used.
    The cluster was running total 33 mappers and 1 reducer so around 8-9 mappers
    per node with 3 GB max limit and they were utilizing around 2GB each.
    Top was showing 100% memory utilized. Which our sys admin says is ok as the
    memory is used for file caching by linux if the processes are not using it.
    Well, yes and no. What is the breakdown of that 100%? Is there any actually allocated to buffer cache or is it all user space?
    No swapping on 3 nodes.
    Then node4 just started swapping after the number of processes shot up
    unexpectedly. The main mystery are these excess number of processes on the
    node which went down. 36 as opposed to expected 11. The other 3 nodes were
    successfully executing the mappers without any memory/swap issues.
    Likely speculative execution or something else. But again: don't build machines with the assumption that only x% of the slots will get used. There is no guarantee in the system that says that free slots will be balanced across all nodes... esp when you take into consideration node failure.
    -Adi
    On Wed, May 11, 2011 at 1:40 PM, Michel Segel wrote:

    You have to do the math...
    If you have 2gb per mapper, and run 10 mappers per node... That means 20gb
    of memory.
    Then you have TT and DN running which also take memory...

    What did you set as the number of mappers/reducers per node?

    What do you see in ganglia or when you run top?

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On May 11, 2011, at 12:31 PM, Adi wrote:

    Hello Hadoop Gurus,
    We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We have
    allocated around 33-34 GB per node for hadoop processes. Leaving the rest of
    the 14-15 GB memory for OS and as buffer. There are no other processes
    running on these nodes.
    Most of the lighter jobs run successfully but one big job is
    de-stabilizing
    the cluster. One node starts swapping and runs out of swap space and goes
    offline. We tracked the processes on that node and noticed that it ends up
    with more than expected hadoop-java processes.
    The other 3 nodes were running 10 or 11 processes and this node ends up with
    36. After killing the job we find these processes still show up and we have
    to kill them manually.
    We have tried reducing the swappiness to 6 but saw the same results. It also
    looks like hadoop stays well within the memory limits allocated and still
    starts swapping.

    Some other suggestions we have seen are:
    1) Increase swap size. Current size is 6 GB. The most quoted size is 'tons
    of swap' but note sure how much it translates to in numbers. Should it be 16
    or 24 GB
    2) Increase overcommit ratio. Not sure if this helps as a few blog comments
    mentioned it didn't help

    Any other hadoop or linux config suggestions are welcome.

    Thanks.

    -Adi
  • Adi at May 11, 2011 at 9:48 pm
    Thanks for your comments Allen.I have added mine inline.
    On May 11, 2011, at 11:11 AM, Adi wrote:

    By our calculations hadoop should not exceed 70% of memory.
    Allocated per node - 48 map slots (24 GB) , 12 reduce slots (6 GB), 1 GB
    each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
    allocation.
    It sounds like you are only taking into consideration the heap
    size. There is more memory allocated than just the heap...
    Our heapsize for mappers is half of memory allocated to each mapper. But
    you're right we should account more for DN/TT/JT.

    The queues are capped at using only 90% of capacity allocated so generally
    10% of slots are always kept free.
    But that doesn't translate into how free the nodes, which you've
    discovered. Individual nodes should be configured based on the assumption
    that *all* slots will be used.
    The cluster was running total 33 mappers and 1 reducer so around 8-9 mappers
    per node with 3 GB max limit and they were utilizing around 2GB each.
    Top was showing 100% memory utilized. Which our sys admin says is ok as the
    memory is used for file caching by linux if the processes are not using
    it.

    Well, yes and no. What is the breakdown of that 100%? Is there
    any actually allocated to buffer cache or is it all user space?
    Here's the breakdown.
    Tasks: 260 total, 1 running, 259 sleeping, 0 stopped, 0 zombie
    Cpu(s): 12.7%us, 1.2%sy, 0.0%ni, 83.6%id, 2.2%wa, 0.0%hi, 0.3%si, 0.0%
    Mem: 49450772k total, 49143288k used, 307484k free, 16912k buffers
    Swap: 5242872k total, 248k used, 5242624k free, 7076564k cached


    No swapping on 3 nodes.
    Then node4 just started swapping after the number of processes shot up
    unexpectedly. The main mystery are these excess number of processes on the
    node which went down. 36 as opposed to expected 11. The other 3 nodes were
    successfully executing the mappers without any memory/swap issues.
    Likely speculative execution or something else. But again: don't
    build machines with the assumption that only x% of the slots will get used.
    There is no guarantee in the system that says that free slots will be
    balanced across all nodes... esp when you take into consideration node
    failure.
    I will look into this.

    Meanwhile, now I am running the job again with three nodes and observing
    that the completed tasks still show up in the process tree. In the earlier
    run I was allocating max heap size as intial, which I have disabled now. So
    mu hunch is that this job will run out of memory a little later.

    Hadoop is showing 8/10 tasks running per node and each node right now has
    25-30 java processes.
    I grepped for a completed attempt and it still shows up in the ps listing.

    Non-Running TasksTask AttemptsStatus attempt_201104280947_1266_m_000033_0
    SUCCEEDED

    $ ps -ef | grep hadoop | grep attempt_201104280947_1266_m_000033_0
    hadoop 17315 5018 26 16:38 ? 00:13:39
    /usr/java/jre1.6.0_21/bin/java
    -Djava.library.path=/usr/local/hadoop/bin/../lib/native/Linux-amd64-64:/usr/local/hadoop/cluster/mapred/local/taskTracker/auser/jobcache/job_201104280947_1266/
    *attempt_201104280947_1266_m_000033_0*/work -Xmx1536M
    -Djava.io.tmpdir=/usr/local/hadoop/cluster/mapred/local/taskTracker/auser/jobcache/job_201104280947_1266/attempt_201104280947_1266_m_000033_0/work/tmp
    -classpath

    Any ideas?

    -Adi

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 11, '11 at 5:31p
activeMay 11, '11 at 9:48p
posts7
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase