FAQ
Hi,

I've been running our app on EC2 using the small instances and it's been
mostly fine. Very occasionally a task will die due to a heap out of memory
exception. So far these failed tasks have successfully been restarted by
Hadoop on other nodes and the job has run to completion.

I want to know how to avoid those occasional out of memory problems.

I tried increasing the mapred.child.java.opts from -Xmx550m to -Xmx768m but
this caused more and much quicker out of memory exceptions. Can someone help
me understand why?

I then reduced it to -Xmx400m and it is running ok so far.

My application is a custom threaded maprunnable app and I often have
hundreds of threads operating at the same time.

Cheers,
John

Search Discussions

  • Raymond Jennings III at Nov 11, 2009 at 4:51 pm
    Is there a way that I can setup directories in dfs for individual users and set the permissions such that only that user can read write such that if I do a "hadoop dfs -ls" I would get "/user/user1 /user/user2 " etc each directory only being able to read and write to by the respective user? I don't want to format an entire dfs filesystem for each user just let them have one sub-directory off of the main /users dfs directory that only they (and root) can read and write to.

    Right now if I run a mapreduce app as any user but root I am unable to save the intermediate files in dfs.

    Thanks!
  • Allen Wittenauer at Nov 11, 2009 at 7:01 pm

    On 11/11/09 8:50 AM, "Raymond Jennings III" wrote:

    Is there a way that I can setup directories in dfs for individual users and
    set the permissions such that only that user can read write such that if I do
    a "hadoop dfs -ls" I would get "/user/user1 /user/user2 " etc each directory
    only being able to read and write to by the respective user? I don't want to
    format an entire dfs filesystem for each user just let them have one
    sub-directory off of the main /users dfs directory that only they (and root)
    can read and write to.

    Right now if I run a mapreduce app as any user but root I am unable to save
    the intermediate files in dfs.

    A) Don't run Hadoop as root. All of your user submitted code will also run
    as root. This is bad. :)

    B) You should be able to create user directories:

    hadoop dfs -mkdir /user/username
    hadoop dfs -chown username /user/username
    ...

    C) If you are attempting to run pig (and some demos), it has a dependency on
    a world writable /tmp. :(

    hadoop dfs -mkdir /tmp
    hadoop dfs -chmod a+w /tmp

    D) If you are on Solaris, whoami isn't in the default path. This confuses
    the hell out of Hadoop so you may need to hack all your machines to make
    Hadoop happy here.
  • Raymond Jennings III at Nov 11, 2009 at 8:15 pm
    Ah okay, I was looking at the options for hadoop and it only shows "fs" and not "dfs" - now that I realize they are one in the same. Thanks!

    --- On Wed, 11/11/09, Allen Wittenauer wrote:
    From: Allen Wittenauer <awittenauer@linkedin.com>
    Subject: Re: User permissions on dfs ?
    To: common-user@hadoop.apache.org
    Date: Wednesday, November 11, 2009, 1:59 PM


    On 11/11/09 8:50 AM, "Raymond Jennings III" wrote:

    Is there a way that I can setup directories in dfs for
    individual users and
    set the permissions such that only that user can read
    write such that if I do
    a "hadoop dfs -ls" I would get "/user/user1
    /user/user2 " etc each directory
    only being able to read and write to by the respective
    user?  I don't want to
    format an entire dfs filesystem for each user just let
    them have one
    sub-directory off of the main /users dfs directory
    that only they (and root)
    can read and write to.

    Right now if I run a mapreduce app as any user but
    root I am unable to save
    the intermediate files in dfs.

    A) Don't run Hadoop as root.  All of your user
    submitted code will also run
    as root. This is bad. :)

    B) You should be able to create user directories:

    hadoop dfs -mkdir /user/username
    hadoop dfs -chown username /user/username
    ...

    C) If you are attempting to run pig (and some demos), it
    has a dependency on
    a world writable /tmp. :(

    hadoop dfs -mkdir /tmp
    hadoop dfs -chmod a+w /tmp

    D) If you are on Solaris, whoami isn't in the default path.
    This confuses
    the hell out of Hadoop so you may need to hack all your
    machines to make
    Hadoop happy here.

  • Edward Capriolo at Nov 11, 2009 at 4:59 pm

    On Wed, Nov 11, 2009 at 11:36 AM, John Clarke wrote:
    Hi,

    I've been running our app on EC2 using the small instances and it's been
    mostly fine. Very occasionally a task will die due to a heap out of memory
    exception. So far these failed tasks have successfully been restarted by
    Hadoop on other nodes and the job has run to completion.

    I want to know how to avoid those occasional out of memory problems.

    I tried increasing the mapred.child.java.opts from -Xmx550m to -Xmx768m but
    this caused more and much quicker out of memory exceptions. Can someone help
    me understand why?

    I then reduced it to -Xmx400m and it is running ok so far.

    My application is a custom threaded maprunnable app and I often have
    hundreds of threads operating at the same time.

    Cheers,
    John
    John,

    If you look at the description of mapred.child.java.opts:

    Java opts for the task tracker child processes.
    The following symbol, if present, will be interpolated

    Thus using -Xmx400m serves a a limit for the max memory a task tracker
    can consume.

    Now you have to look at:
    mapred.tasktracker.map.tasks.maximum
    mapred.tasktracker.reduce.tasks.maximum

    If you have set these variables too high each node will spawn more
    tasks then it can handle memory wise, this will cause threads will
    die.

    What you have to do here is look hard at how much memory you have on
    your machine, how many map and reduce tasks can run on the machine,
    anything else that may be running on the machine. Then you have to set
    -Xmx lower then this.

    Many things effect this. For example if you raise

    tasktracker.http.threads

    Now the task tracker will have more threads and probably consume more memory.

    Edward

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 11, '09 at 4:37p
activeNov 11, '09 at 8:15p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase