FAQ
Hi there,

I'm running some streaming jobs on ec2 (ruby parsing scripts) and in
my most recent test I managed to spike the load on my large instances
to 25 or so. As a result, I lost communication with one instance. I
think I took down sshd. Whoops.

My question is, has anyone got strategies for managing resources used
by the processes spawned by streaming jar? Ideally I'd like to run my
ruby scripts under nice.

I can hack something together with wrappers, but I'm thinking there
might be a configuration option to handle this within Streaming jar.
Thanks for any suggestions!

--
Chris Anderson
http://jchris.mfdz.com

Search Discussions

  • Joydeep Sen Sarma at Jun 26, 2008 at 2:14 pm
    Memory limits were handled in this jira:
    https://issues.apache.org/jira/browse/HADOOP-2765

    However - in our environment - we spawn all streaming tasks through a
    wrapper program (with the wrapper being defaulted across all users). We
    can control resource uses from this wrapper (and also do some level of
    job control).

    This does require some minor code changes and would be happy to contrib
    - but the approach doesn't quite fit in well with the approach in 2765
    that passes each one of these resource limits as a hadoop config param.

    -----Original Message-----
    From: [email protected] On Behalf Of Chris
    Anderson
    Sent: Wednesday, June 25, 2008 7:00 PM
    To: [email protected]
    Subject: process limits for streaming jar

    Hi there,

    I'm running some streaming jobs on ec2 (ruby parsing scripts) and in
    my most recent test I managed to spike the load on my large instances
    to 25 or so. As a result, I lost communication with one instance. I
    think I took down sshd. Whoops.

    My question is, has anyone got strategies for managing resources used
    by the processes spawned by streaming jar? Ideally I'd like to run my
    ruby scripts under nice.

    I can hack something together with wrappers, but I'm thinking there
    might be a configuration option to handle this within Streaming jar.
    Thanks for any suggestions!

    --
    Chris Anderson
    http://jchris.mfdz.com
  • Allen Wittenauer at Jun 26, 2008 at 2:34 pm

    On 6/26/08 10:14 AM, "Joydeep Sen Sarma" wrote:
    However - in our environment - we spawn all streaming tasks through a
    wrapper program (with the wrapper being defaulted across all users). We
    can control resource uses from this wrapper (and also do some level of
    job control).
    This is essentially what we're doing via torque (and therefore hod).
    This does require some minor code changes and would be happy to contrib
    - but the approach doesn't quite fit in well with the approach in 2765
    that passes each one of these resource limits as a hadoop config param.
    As I said in HADOOP-3280, I'm still convinced that having Hadoop set
    these types of restrictions directly is the wrong approach. It is almost
    always going to be better to use the OS controls that are specific to the
    installation. Enabling OS specific features means that, at a maximum,
    hadoop should likely being calling a script rather than doing ulimits or
    having the equivalent of #ifdef code everywhere. [After all, what if I want
    to use Solaris-specific features like projects or privileges?]

    But I suspect most of the tunables that people will care about can
    likely be managed at the OS level before hadoop is even involved.
  • Vinod KV at Jun 27, 2008 at 10:01 am

    Allen Wittenauer wrote:
    This is essentially what we're doing via torque (and therefore hod).
    When we intend to move away from HOD(and thus torque) to using Hadoop
    Resource manager(HADOOP-3421) and scheduler(HADOOP-3412) interfaces, we
    need to move this resource management functionality into Hadoop.
    As I said in HADOOP-3280, I'm still convinced that having Hadoop set
    these types of restrictions directly is the wrong approach. It is almost
    always going to be better to use the OS controls that are specific to the
    installation. Enabling OS specific features means that, at a maximum,
    hadoop should likely being calling a script rather than doing ulimits or
    having the equivalent of #ifdef code everywhere. [After all, what if I
    want
    to use Solaris-specific features like projects or privileges?]

    But I suspect most of the tunables that people will care about can
    likely be managed at the OS level before hadoop is even involved.
    Please see HADOOP-3581(Prevent memory intensive user tasks from taking
    down nodes). This issue is aimed at a general solution for putting
    aggregate memory limits on the tasks and any subprocesses that tasks
    might launch.

    Your comments don't seem to be in line with what I proposed on this
    issue JIRA. Having Hadoop to just call a sript, rather than doing
    ulimits itself, does look nice, but by doing just ulimits alone, Hadoop
    cannot have a complete control over what the tasks do. For e.g. as
    stated on the JIRA, run-away tasks that fork themselves repeatedly could
    wreck havoc and disturb the normal functioning of not only Hadoop
    daemons but also might bring down the nodes themselves.

    The current solution HADOOP-3280, doesn't preclude this, it only limits
    memory usable by a single process, not its subprocesses. Neither do
    ulimits via limits.conf will suffice - it just limits vmem usable
    per-process per-user as I checked.

    Not to sound imitating Torque a bit too much, even torque resorts to
    have control over the process tree itself rather than depending directly
    on OS specific tools - this is done by specific code for each platform.

    We need a consensus on all of this. And I might be missing something
    too. Can you please comment on HADOOP-3581?

    Thanks,
    -Vinod
  • Chris Anderson at Jun 27, 2008 at 3:58 pm
    Having experimented some more, I've found that the simple solution is
    to limit the resource usage by limiting the # of map tasks and the
    memory they are allowed to consume.

    I'm specifying the constraints on the command line like this:

    -jobconf mapred.tasktracker.map.tasks.maximum=2 mapred.child.ulimit=1048576

    The configuration parameters seem to take, in the job.xml available
    from the web console, I see these lines:

    mapred.child.ulimit 1048576
    mapred.tasktracker.map.tasks.maximum 2

    The problem is that when there are a large number of map tasks to
    complete, Hadoop doesn't seem to obey the map.tasks.maximum. Instead,
    it is spawning 8 map tasks per tasktracker (even when I change the
    mapred.tasktracker.map.tasks.maximum in hadoop-site.xml to 2, on the
    master). The cluster was booted with the setting at 8. Do I need to
    change hadoop-site.xml on all the slaves, and restart the task
    trackers, in order to make the limit apply? That seems unlikely - I'd
    really like to manage this parameter on a per-job level.

    Thanks for any input!

    Chris

    --
    Chris Anderson
    http://jchris.mfdz.com
  • Rick Cox at Jun 27, 2008 at 4:16 pm

    On Fri, Jun 27, 2008 at 08:57, Chris Anderson wrote:

    The problem is that when there are a large number of map tasks to
    complete, Hadoop doesn't seem to obey the map.tasks.maximum. Instead,
    it is spawning 8 map tasks per tasktracker (even when I change the
    mapred.tasktracker.map.tasks.maximum in hadoop-site.xml to 2, on the
    master). The cluster was booted with the setting at 8. Do I need to
    change hadoop-site.xml on all the slaves, and restart the task
    trackers, in order to make the limit apply? That seems unlikely - I'd
    really like to manage this parameter on a per-job level.
    Yes, mapred.tasktracker.map.tasks.maximum is configured per
    tasktracker on startup. It can't be configured per job because it's
    not a job-scope parameter (if there are multiple concurrent jobs, they
    have to share the task limit).

    rick

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 26, '08 at 2:00a
activeJun 27, '08 at 4:16p
posts6
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase