Allen Wittenauer wrote:
This is essentially what we're doing via torque (and therefore hod).
When we intend to move away from HOD(and thus torque) to using Hadoop
Resource manager(HADOOP-3421) and scheduler(HADOOP-3412) interfaces, we
need to move this resource management functionality into Hadoop.
As I said in HADOOP-3280, I'm still convinced that having Hadoop set
these types of restrictions directly is the wrong approach. It is almost
always going to be better to use the OS controls that are specific to the
installation. Enabling OS specific features means that, at a maximum,
hadoop should likely being calling a script rather than doing ulimits or
having the equivalent of #ifdef code everywhere. [After all, what if I
want
to use Solaris-specific features like projects or privileges?]
But I suspect most of the tunables that people will care about can
likely be managed at the OS level before hadoop is even involved.
Please see HADOOP-3581(Prevent memory intensive user tasks from taking
down nodes). This issue is aimed at a general solution for putting
aggregate memory limits on the tasks and any subprocesses that tasks
might launch.
Your comments don't seem to be in line with what I proposed on this
issue JIRA. Having Hadoop to just call a sript, rather than doing
ulimits itself, does look nice, but by doing just ulimits alone, Hadoop
cannot have a complete control over what the tasks do. For e.g. as
stated on the JIRA, run-away tasks that fork themselves repeatedly could
wreck havoc and disturb the normal functioning of not only Hadoop
daemons but also might bring down the nodes themselves.
The current solution HADOOP-3280, doesn't preclude this, it only limits
memory usable by a single process, not its subprocesses. Neither do
ulimits via limits.conf will suffice - it just limits vmem usable
per-process per-user as I checked.
Not to sound imitating Torque a bit too much, even torque resorts to
have control over the process tree itself rather than depending directly
on OS specific tools - this is done by specific code for each platform.
We need a consensus on all of this. And I might be missing something
too. Can you please comment on HADOOP-3581?
Thanks,
-Vinod