FAQ
Map and Reduce tasks should run as the user who submitted the job
-----------------------------------------------------------------

Key: HADOOP-4490
URL: https://issues.apache.org/jira/browse/HADOOP-4490
Project: Hadoop Core
Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Fix For: 0.20.0


Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.

For security and accounting purposes the tasks should be run as the job-owner.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Arun C Murthy (JIRA) at Oct 22, 2008 at 8:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Arun C Murthy updated HADOOP-4490:
    ----------------------------------

    Component/s: security
    mapred
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Arun C Murthy
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vinod K V (JIRA) at Oct 23, 2008 at 3:29 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642042#action_12642042 ]

    Vinod K V commented on HADOOP-4490:
    -----------------------------------

    Duplicate of HADOOP-4451.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Arun C Murthy
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Oct 23, 2008 at 3:53 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642043#action_12642043 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I think HADOOP-4451 is either related or a duplicate of this. Correct ?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Arun C Murthy
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Oct 31, 2008 at 6:45 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Assignee: (was: Arun C Murthy)
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Nov 19, 2008 at 1:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala reassigned HADOOP-4490:
    ----------------------------------------

    Assignee: Hemanth Yamijala
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Nov 19, 2008 at 3:03 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649065#action_12649065 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    Before beginning discussions on approach, I wanted to summarize my understanding of this task, and also start discussion on a few points that I have some questions on.

    The following are some salient points:
    # We want to run tasks as the user who submitted the job, rather than as the user running the daemon.
    # I think we also don't want to run the daemon as a privileged user (such as root) in order to solve this requirement. Right ?
    # The directories and files used by the task should have appropriate permissions. Currently these directories and files are mostly created by the daemons, but used by the task. A few are used/accessed by the daemons also. Some of these directories and files are the following:
    ## mapred.local.dir/taskTracker/archive - directories containing distributed cache archives
    ## mapred.local.dir/taskTracker/jobcache/$jobid/ - Include work (which is a scratch space), jars (containing the job jars), job.xml.
    ## mapred.local.dir/taskTracker/jobcache/$jobid/$taskid - Include job.xml, output (intermediate files), work (current working dir) and temp (work/tmp) directories for the task.
    ## mapred.local.dir/taskTracker/pids/$taskid - Written by the shell launching the task, but read by the daemons.
    # What should 'appropriate' permissions mean ? I guess read/write/execute (on directories) for the owner of the job is required. What should the permissions be for others ? If the task is the only consumer, then the permissions for others can be turned off. However, there are cases where the daemon / other processes might read the files. For instance:
    ## The distributed cache files can be shared across jobs.
    ## Jetty seems to require read permissions on the intermediate files to serve them to the reducers.
    In the above cases, can we make these world readable ?
    ## Task logs are currently generated under ${hadoop.log.dir}/userlogs/$taskid. These are served from the TaskLogServlet of the TaskTracker.
    # Apart from launching the task itself, we may need some other actions to be performed as the job owner. For instance:
    ## Killing of a task
    ## Maybe setting up and cleaning up of the directories / files
    ## Running the debug script - {{mapred.map|reduce.task.debug.script}}

    Is there anything that I am missing ? Comments on the questions of shared directories / files - distributed cache, intermediate outputs, log files ?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Craig Macdonald (JIRA) at Nov 21, 2008 at 12:23 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649658#action_12649658 ]

    Craig Macdonald commented on HADOOP-4490:
    -----------------------------------------

    I think that (2) depends on how (1) is proposed to be addressed. If you assume that (1) is addressed by using seteuid() or the su command such that processes actually run on the system as the appropriate user, then (2) is extremely difficult without being ruin as root.

    If (1) is addressed just by setting the UGI in some way, then this had disadvantages compared to the seteuid/su - which facilitates secured access to non-HDFS resources (e.g. NFS in smaller environments).


    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Nov 28, 2008 at 2:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651568#action_12651568 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I had some offline discussions with Arun and Sameer and here are some initial thoughts on approach. A lot of details still need to be flushed out, but I am posting this to get some early feedback.

    We do want to run the daemons as non-privileged users, and yet go with a setuid based approach to run tasks as a regular user. One approach that was proposed to do this is as follows:
    - We create a setuid executable, say a taskcontroller, that will be owned by root.
    - This executable can take the following arguments - <user> <command> <command arguments>.
    - <user> will be the job owner.
    - <command> will be an action that needs to be performed, such as LAUNCH_JVM, KILL_TASK, etc.
    - <command arguments> will depend on the command. For e.g. LAUNCH_JVM would have the arguments currently used to launch a JVM via the ShellCommandExecutor.
    - The tasktracker will launch this executable with the appropriate command and arguments when needed.
    - As the executable is a setuid exe, it will run as root, and will quickly drop privileges using setuid, to run as the user.
    - Then the arguments will be used to execute the required action, for e.g. launching a VM or killing a task.
    - Before dropping privileges, if needed, the executable could set up directories with appropriate ownership, etc.
    - Naturally this would be platform specific. Hence, we can define a TaskController class that defines APIs to encapsulate these actions. For e.g., something like:
    {code}
    abstract class TaskController {

    abstract void launchTask();
    abstract void killTask(Task t);
    // etc...
    }
    {code}
    - This could be extended by a LinuxTaskController, that converts the generic arguments into something that can be passed to executable - for e.g. maybe a process ID.
    - One specific point is about the directory / file permissions. Sameer was of the opinion that the permissions should be quite strict, that is, world readable rights are not allowed. There are cases where the task as well as the daemon may need to access files. To handle this, one suggestion is to first set the permissions to the user, and then change the ownership to the daemon after the task is done.

    The points above specify a broad approach. Please comment on whether this seems reasonable, reasonable in parts, or completely way off the mark. *smile*. Based on feedback, I would start implementing a prototype to flush out the details.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Nov 28, 2008 at 8:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651616#action_12651616 ]

    Owen O'Malley commented on HADOOP-4490:
    ---------------------------------------

    +1 for a setuid program.

    It should be written in C, not Java to ensure it has enough access to the platform to actually be secure. In particular, it has to clear both real and effective user ids.

    I'd like to see the proposed list of commands for the setuid program.

    No user-specified strings should be included on the command line, to avoid special character attacks.

    I agree with Sameer that we should have very tight permissions on the map output and task directories. One of the subcommands should probably be to move the outputs from somewhere like $task/output to somewhere like $tt/output/$job/$task.

    Having a plugin that lets us switch between the current pure-java implementation that doesn't change user ids and a setuid implementation sounds reasonable. We should continue to support the non-user-switch by default for clusters run by a single non-root user.

    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 1, 2008 at 2:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652015#action_12652015 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    Thanks for the comments, Owen.

    bq. It should be written in C, not Java to ensure it has enough access to the platform to actually be secure. In particular, it has to clear both real and effective user ids.
    Yes, I had that in mind. Specifically, I was planning to do something like setuid(getpwnam(user_name)->pw_uid). Since this would be done by a program running as superuser (the setuid exe), it would clear both the real and effective uids.

    bq. I'd like to see the proposed list of commands for the setuid program.
    Sure, I will work on that and post the list here. In order to be reasonably complete, I think I should have a version that's working. So, I will start prototyping on the lines I described above.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrzej Bialecki (JIRA) at Dec 1, 2008 at 3:34 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652028#action_12652028 ]

    Andrzej Bialecki commented on HADOOP-4490:
    -------------------------------------------

    What about Cygwin / Windows users?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Fix For: 0.20.0


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 3, 2008 at 6:27 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Fix Version/s: (was: 0.20.0)

    Reseting fix for version, as this will not make the feature freeze.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Dec 4, 2008 at 11:42 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653267#action_12653267 ]

    Steve Loughran commented on HADOOP-4490:
    ----------------------------------------

    You are about to take on one of the big problems they hit in the grid world: identity. all the grid tools (condor, platform, etc) have lots of effort put in at the OS level to create new users on target machines, manage the disk and cpu usage limits of that user, etc. But you also need to propagate identity over the wire, which gets you into SAML and other things. Because right now the JobTracker trusts you to be who you say you are -having caller authentication would be a prerequisite to doing back-end user switching.

    If you are interested in running pure Java apps under different rights, this could be done via a security manager. Every task would be started with an explicit security manager/policy that limited what it could do, file and network operations would be checked against the policy. This would be portable and easier to test. It also eliminates the need to run the TT as root, to keep the unix user database in sync with the hadoop user list, etc.



    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 4, 2008 at 3:17 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653327#action_12653327 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    Steve, I do agree that the security manager approach is simpler and portable. However, I think the requirement is also to support features like streaming. Given that, I think the security manager approach would not work. Am I right ?

    Also, I agree that authentication and authorization is a pre-requisite for this. It is being handled by the other tasks under the jira HADOOP-4487. Here, I am focussing only on the mechanisms to make tasks run as the users who submitted the jobs - a small part of the larger framework.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 4, 2008 at 5:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653375#action_12653375 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I have been able to make some progress and get a wordcount job to run as the job submitter. The design follows the basic approach mentioned above, minus the plugin abstraction, which I need to create yet.

    - Created a setuid C executable.
    - This executable currently takes the following commands:
    -- SETUP_DIRS <list of directories>:
    This command sets up task specific directories to be owned by the user. The general approach I followed for handling directory permissions is that the root directories, such as hadoop.tmp.dir/mapred/local/taskTracker/jobcache/jobid would be owned by the tasktracker daemon, which creates task directories under it when needed. Then the taskcontroller exe will change the ownership and permissions of the task directory and sub folders to the user.
    -- RUN_TASK <path to a file containing the M/R task to execute>
    The file is a temp file created under the user's work directory itself - executable by the user
    -- MOVE_FILES <source directory> <destination directory>
    This command is used to copy the intermediate output and task logs from the task directories to a system specific directory owned by the daemon. The servlets serving this data are modified to read from the system specific directory.
    - These are called from the JvmManager class at appropriate places.

    A couple of things came up when doing this:

    - Task logs: Currently task logs can be viewed when the task is still executing. Further the task logs are read by the TaskLogServlet, which is running in the daemon context. We want the task logs to be owned by the user. I still need to figure out how to achieve this. Currently, I am only able to access task logs after they are done, by executing the MOVE_FILES command. Any ideas are welcome.

    - JVM Reuse: Currently, I've only handled this with one JVM per task. Need to check the approach when JVM reuse is in the picture.

    - Still need to work on cleanup and kill actions, as also distributed cache.

    The code I have is very raw and needs lots of polishing even as a first draft. Will try to do so in a couple of days. Any comments on the approach so far ?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Dec 8, 2008 at 12:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654392#action_12654392 ]

    Steve Loughran commented on HADOOP-4490:
    ----------------------------------------

    Hemanth -you are right, for streaming/pipes stuff a second identity is needed. What some of the grid toolkits have done in the past is have some low-privilege user for running work; there isn't a 1:1 mapping of grid users to user accounts, instead the worker is allowed access to the relevant files of a user for a while, then at the end of the job, that data goes away. This eliminates some of the account management problems, though forces you to make sure that the worker doesnt have access to any old/shared data on the same filesystem.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Dec 8, 2008 at 5:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654477#action_12654477 ]

    Owen O'Malley commented on HADOOP-4490:
    ---------------------------------------
    From what I've heard, running under a security manager kills performance, which is pretty much a non-starter. Especially given that we need unix-level security anyways. In our environment, running as the real user is important. If I run a job, I should not be able to look at or kill your job's data or tasks, even if we are sharing a machine. Of course this feature needs to be optional, since:
    1. It requires that all cluster users have accounts on all slave nodes in the cluster
    2. It requires native code that may not work on all platforms
    3. It requires root access, which not all Hadoop admins have.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Dec 8, 2008 at 5:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654482#action_12654482 ]

    Owen O'Malley commented on HADOOP-4490:
    ---------------------------------------

    Hemanth,
    This sounds good, but from a security standpoint, I think that it would be better to make the tasks more specific. So something like:

    CREATE_TASK_DIR owen task_20080101_0001_m_000001_1
    MOVE_TASK_OUTPUT owen task_20080101_0001_m_000001_1
    REMOVE_TASK_DIR owen task_20080101_0001_m_000001_1

    It would also be good to have the task tracker root directories in a separate config file that can be owned by root. My goal is to make this executable as limited as possible. It should also block root as the user. What I do *not* want to see is having this work:

    MOVE_FILES root /tmp/foo /etc/passwd

    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Dec 8, 2008 at 6:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654496#action_12654496 ]

    Doug Cutting commented on HADOOP-4490:
    --------------------------------------
    Steve: have some low-privilege user for running work; there isn't a 1:1 mapping of grid users to user accounts
    Owen: running as the real user is important. If I run a job, I should not be able to look at or kill your job's data or tasks
    Might it be possible to have a pool of low-privileged users, to remove the requirement that every user has an account on every machine? Or maybe that requirement's not that onerous, with PAM/LDAP?

    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Allen Wittenauer (JIRA) at Dec 8, 2008 at 7:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654530#action_12654530 ]

    Allen Wittenauer commented on HADOOP-4490:
    ------------------------------------------

    The user who submits the job should be the user who runs the code on the compute nodes due to issues that surround the environment outside Haddop. For example, it is possible to submit a job that writes junk data to the low priv user's home dir. Without tracking who submitted that job, ops would never know who to go bonk on the head.

    ... and then there is streaming.

    I can think of instances where it might be useful to have generic accounts run stuff. In those instances, it is still much better to have that handled outside Hadoop. [Either through setuid scripts, roles, sudo, kinit a special keytab prior to job submit, whatever.] Let the OS/tool/ops team/whatever deal with the accounting in those situations.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 9, 2008 at 2:11 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654802#action_12654802 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    bq.CREATE_TASK_DIR owen task_20080101_0001_m_000001_1

    Owen, sure this makes sense. Just one point is that I might need the job id along with the task id. But that's still within the same spirit.

    I will also make the other changes you have suggested like blocking root user.

    I've also added a CLEANUP_TASK_DIR command to the executable which is now able to cleanup the directories after task is completed. This is called from the CleanupQueue thread in task tracker.


    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 10, 2008 at 2:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655207#action_12655207 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I had an offline discussion with Devaraj regarding the implementation, and we also went over the impact this would have when clubbed with JVM reuse.

    A few comments from him that I am documenting here:
    - Task directories under the tasktracker system or root directory to which files (such as intermediate outputs) are copied after task completion should be in the same disk as the original user's task directories. This is to prevent across disk copies.
    - Regarding the problem of serving log outputs which I've mentioned [here|#action_12653375], we discussed one approach could be to have a command in the executable to read the data and return to the TaskLogServlet on demand. This would happen reasonably rarely and does not affect any other functionality. Hence it seems like the performance overhead can be ignored.
    - Another comment was to reduce the number of times the executable is launched. For e.g. *without* JVM reuse, I can setup the directories, run the task, and then move the outputs with a single launch of the executable. This is possible because all actions are per task, and there is one JVM per task. Hence the lifecycle of the task fits well with the setuid changes.

    With JVM reuse though, the last point becomes problematic. We can easily setup the directories and move the output before and after the task. However, that needs to be done with a separate launch of the executable - three times actually. The performance impact this would have (and would it offset the advantage of JVM reuse) is something to measure and see.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 11, 2008 at 3:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655673#action_12655673 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I have a version now that runs with JVM reuse enabled also. The main changes to get this to work was to correct the way I was figuring out the current task ids in the JvmManager class.

    Also added a KILL_TASK command. This will look as follows:

    KILL_TASK user_name job_id task_attempt_id

    This will be called from JvmRunner.kill(), which in turn is called whenever a TT gets a kill task action. Since the JVM process is running as the job owner (different from the TT), we can't directly destroy the JVM process. Instead, what I've done is the following:
    - Write a (hidden) .pid file into the task directory when a task is executed. This is owned by the job owner and not readable by anyone else. The pid file contains the JVM's pid.
    - When the JVM needs to be killed, we call the taskcontroller executable with the job_id and task_id.
    - The taskcontroller drops privileges to the job owner, then reads the pid file and gets the pid of the jvm.
    - Then the taskcontroller issues a kill(pid, SIGTERM) to kill the jvm.

    Any concerns with this approach ?

    Currently other than distributed cache, all other aspects of the task life cycle are functioning. I'll probably upload a single writeup (as Arun had done for HADOOP-4348) that will capture all the information in comments above for easy reference.

    And of course, follow that up with the first patch. *smile*
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Dec 11, 2008 at 6:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655721#action_12655721 ]

    Owen O'Malley commented on HADOOP-4490:
    ---------------------------------------

    That sounds reasonable. Please ensure that the pid file is owned by the user given on the command line and has permissions of 600. This avoids someone leaving this file writable and having someone point it at a different process.

    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 12, 2008 at 4:45 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655883#action_12655883 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    Thanks, Owen. I will take care of that.

    Some discussion is required for handling distributed cache (HADOOP-4493). Firstly, localized files from distributed cache are not localized per job. Since anything can be passed through distributed cache, I think it should support the same level of access control as the rest of the files. That is, they should be changed to be localized per job and subject to the same access control mechanisms we are using for the rest of the files - like output directories etc.

    I don't think this is a big impact for users as they can't assume the cache to contain the files they want on the nodes where the task is running. However, from the system perspective, probably if a lot of users (say working on the same project that requires the same data files) want to share this but across multiple jobs, we would be copying only once per node, saving both space and time. If we modify this to be localized per job, we could lose that advantage, no ?

    Any thoughts on this trade off ?


    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 12, 2008 at 8:27 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655922#action_12655922 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I had an offline discussion with Sameer about how to get this patch in. To make it easier for reviewing, maybe it makes sense to split the task up into multiple sub tasks. Atleast 3 that are identified are:
    - Launch and kill tasks (this would involve RUN_TASK and KILL_TASK commands)
    - Handle local data securely (this would involve SETUP_TASK and MOVE_TASK_OUTPUT and CLEANUP_TASK commands)
    - Handle distributed cache.

    In order to get a working launch and kill tasks patch though, the file and directory permissions will need to be opened up to allow access to all users. Each of the other patches will make it more secure.

    Please note that we have discussed the approach of how we will address directory and file permissions (such as intermediate outputs) in this JIRA already. This proposal is only to make it simpler to get some incremental patches in. Would this work ? If yes, I will use this JIRA to handle the first of the three tasks, then use HADOOP-4491 and HADOOP-4493 for the others.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 22, 2008 at 8:21 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658463#action_12658463 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    bq. It would also be good to have the task tracker root directories in a separate config file that can be owned by root.

    We are taking care of this point in the setuid executable. One question is to determine how the location of this secure config file will known to the executable. Following are our options:

    Option 1: Read from the environment variable HADOOP_CONF_DIR
    Option 2: Take a command line option to specify the location of the file.
    Option 3: Have it as a build time configuration parameter, and encode into the executable (like for instance, pass it as an autoconf option).

    Options 1 and 2 may allow users to launch the executable pointing to some custom path. Option 3 would completely avoid this, and make it more secure.

    For the sake of deployment, I think the setuid executable should be built using a separate ant target, as it would need to be setup as owned by root etc. So, maybe it is easy to do Option 3 in that case. If we decide to go with one of the other two options, we should mandate additional checks to make sure that the configuration file is owned by the root user, as Owen mentioned.

    Any comments ?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala

    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 23, 2008 at 4:34 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: HADOOP-4490.patch
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 23, 2008 at 4:42 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658744#action_12658744 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    The attached patch implements changes in the tasktracker to launch tasks using the setuid executable defined in HADOOP-4930. By doing so, it runs tasks as job owners. The CLI for the setuid exe is:
    {code}
    task-controller <user-name> <command-enum-value> <job-id> <task-id> <tasktracker-root>
    {code}

    As mentioned in comments above, this patch only handles launching and killing of tasks, and does not handle file and directory permissions securely. In fact, it opens up the permissions so that both the tasktracker and task can share files and directories. However, this change is only done when the feature is enabled, and does not affect the default Hadoop behavior. When HADOOP-4491 and other issues are fixed, secure permissions will be replaced.

    The changes in the patch include:
    - A TaskController class that defines abstract methods for launching and killing tasks
    - A DefaultTaskController where a little code from JvmManager has been moved
    - A LinuxTaskController which implements the methods by calling the setuid executable of HADOOP-4930.
    - A new configuration variable mapred.task.tracker.task-controller to define the specific type of TaskController to use. Defaults to DefaultTaskController.

    Tested this on a single node cluster, along with the setuid executable of HADOOP-4930. Will follow-up with testing on larger clusters.

    I request a review for the same.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Dec 23, 2008 at 4:44 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: hadoop-4490-design.pdf

    A lot of discussion has happened on various comments in this JIRA. The attached document collates all of them. I hope this will make it easier to follow the approach and review the changes.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 6, 2009 at 6:00 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: HADOOP-4490.patch
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 6, 2009 at 8:00 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661068#action_12661068 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I attached a new patch that is more comprehensive. All changes from the previous patch still hold good. This one adds the correct permissions for all relevant files and directories, except distributed cache.

    The previous patch only set relevant permissions on the task and log cache directories for all users, with the intent that tasks running as any user should be able to create and use other files and directories under them. This requirement still applies. However, there are other files and directories whose access needs to be adjusted too. The new patch addresses these changes:

    - It sets permissions on the job related jar files and other directories allowing access to everyone.
    - It sets read and execute permissions on directory paths until the task / job cache and log directories. For e.g. if a task cache directory is created under ${mapred.local.dir}/taskTracker/jobcache, all paths in this component are attempted to be given read and execute (and no write) access for all users. This is required for looking up paths and locating / reading files created by the tasktracker.

    Both the changes above are required in future as well. Except then, the permission string would be more restrictive (disallowing access to group and others).

    The previous patch was working because of a subtle behavior in setuid. On the systems where we tested, the umask was set such that read and execute permissions were provided to group by default. So, any of the job files created by the tasktracker had read and execute to the group to which the tasktracker user belonged. When the setuid executable switched users, it does not clear the supplementary group information of the launcher. Hence, the new process running as the job owner still had access to the groups to which the tasktracker belonged, and hence worked. Again, in HADOOP-4491, we propose to remove all access for the group ownership also, and hence this will not be an issue.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 13, 2009 at 7:49 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: HADOOP-4490.patch
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 13, 2009 at 7:53 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663257#action_12663257 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    The latest patch also adds correct permissions to files localized as part of the distributed cache. In order to do this, I introduced a new API in distributed cache to indicate whether the files were localized freshly (i.e. as part of the current task's localization), or whether they are already existing in cache. I use this API to avoid setting permissions on the same cache files repeatedly for each task. If there's a better way to do this, I would be glad to know that.

    All changes made in the previous patch still hold good, except for minor refactoring.

    This patch is now complete to the best of my knowledge. Please do offer your comments.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Arun C Murthy (JIRA) at Jan 22, 2009 at 5:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666220#action_12666220 ]

    Arun C Murthy commented on HADOOP-4490:
    ---------------------------------------

    Hemanth, this is look good.

    Some comments:

    # We should use mapred.local.dir instead of hadoop.tmp.dir in LinuxTaskController.
    # Use Path's methods instead of String manipulation for all path-related manipulations.
    # Pass mode, user/group to DistributedCache rather than rely on the newly introduced DistributedCache.isFreshlyLoaded which is then unnecessary.
    # Move setting up of JVM-specific files e.g. task's log directory to TaskController.launchJVM.

    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 1, 2009 at 8:20 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: HADOOP-4490.patch
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 1, 2009 at 8:28 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669354#action_12669354 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I've updated the patch to trunk, incorporating most of Arun's comments above. Arun, can you please take a look.

    bq. We should use mapred.local.dir instead of hadoop.tmp.dir in LinuxTaskController.
    Done.

    bq. Use Path's methods instead of String manipulation for all path-related manipulations.
    Done.

    bq. Pass mode, user/group to DistributedCache rather than rely on the newly introduced DistributedCache.isFreshlyLoaded which is then unnecessary.
    Done. I've added a new overloaded API that passes the information to DistributedCache. Just to keep options open, I've defined a new public class DistributedCacheFileAccessInfo - a simple class that can be used to define permissions and ownership information for localized files in DistributedCache. Can you take a specific look at this, and let me know if this looks OK ?

    bq. Move setting up of JVM-specific files e.g. task's log directory to TaskController.launchJVM
    I've not done this one alone. It was not very clear what information is necessary at launch time. For e.g. if there are some localized files under the task cache directory that need to be loaded at launch time, we'll need permissions for these also. In general, it seemed a little risky to launch the JVM without giving full access to all jars etc, even if the Task will start running later only. So, I've left this as is. I think the main concern here was about the special check I had in JvmManager where I was avoiding setting the permissions again when getting the task to launch. This seems a simple enough check, and I've documented the rationale in code. Can you verify this again, and let me know your thoughts ?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 5, 2009 at 4:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: HADOOP-4490.patch
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 5, 2009 at 4:32 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670819#action_12670819 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    Attached a hopefully last version of the patch. This one is extensively tested and has fixed a couple of bugs related to incorrect assumptions about multiple mapred local directories. Thanks to Sreekanth and Amar for help in testing this. We're run randomwriter, sort with and without JVM reuse, and also streaming and using distributed cache.

    The test-patch results are showing a -1 on release audit which I've written to core-dev about. I am not sure why a -1 is coming, will continue to debug that. There's also a -1 on tests. It is difficult to write unit tests for this patch since it requires support for multiple users.

    There are a lot of log statements in the patch which should be removed before commit. I am attaching this with the hope that someone can take a look at the changes. Arun ?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 6, 2009 at 8:04 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: HADOOP-4490.patch
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 6, 2009 at 8:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671042#action_12671042 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    Merged with trunk. It was broken by a commit done yesterday night. Also removed extraneous logs. Running ant test.

    test-patch gives following output:

    [exec] -1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
    [exec] Please justify why no tests are needed for this patch.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
    [exec]
    [exec] -1 release audit. The applied patch generated 823 release audit warnings (more than the trunk's current 820 warnings).

    The -1 on release audit is because jdiff has generated some changes to public classes and packages (DistributedCache, FileUtil and the filecache package). The release audit seems to be flagging these new jdiff changed files as warnings. I've cross checked the ASF license header is included for all new files I've put up.

    The -1 on tests is as explained above.

    There's no change in functionality to the last patch I uploaded. Only a merge.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 6, 2009 at 8:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671049#action_12671049 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    I also have to mention that some additional work would be required in the LinuxTaskController to sync up with changes from HADOOP-4759 that was committed yesterday, as it introduced a new task (TaskCleanup task) which would create directories like task-attempt-id-cleanup.

    However, the default path (covered by the DefaultTaskController) is not affected, and hence there will be no regression. I am proposing that apart from this, if other things are fine, we should commit HADOOP-4490 and then in a follow up JIRA fix the changes with respect to HADOOP-4759. Otherwise I am seeing that HADOOP-4490 is becoming a moving target (*smile*). Does this make sense ?
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 6, 2009 at 11:34 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Status: Patch Available (was: Open)
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 6, 2009 at 11:34 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671098#action_12671098 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    ant test passes locally.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Feb 6, 2009 at 6:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671234#action_12671234 ]

    Hadoop QA commented on HADOOP-4490:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12399627/HADOOP-4490.patch
    against trunk revision 741330.

    +1 @author. The patch does not contain any @author tags.

    -1 tests included. The patch doesn't appear to include any new or modified tests.
    Please justify why no tests are needed for this patch.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 release audit. The applied patch generated 821 release audit warnings (more than the trunk's current 819 warnings).

    +1 core tests. The patch passed core unit tests.

    -1 contrib tests. The patch failed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/testReport/
    Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/artifact/trunk/current/releaseAuditDiffWarnings.txt
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/console

    This message is automatically generated.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 7, 2009 at 3:19 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Status: Open (was: Patch Available)
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 7, 2009 at 3:19 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Attachment: HADOOP-4490.patch
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 7, 2009 at 3:23 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-4490:
    -------------------------------------

    Status: Patch Available (was: Open)

    The attached file fixes the streaming test failures. I'd made changes to FileUtil.java to run the chmod command using the ShellCommandExecutor. Previously this was using the Process class. There was no reason to change it, so I moved back to using Process and the tests passed locally.

    I thought ant test runs contrib tests as well, which is why I missed these on the first patch.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Feb 7, 2009 at 7:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671509#action_12671509 ]

    Hadoop QA commented on HADOOP-4490:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12399725/HADOOP-4490.patch
    against trunk revision 741776.

    +1 @author. The patch does not contain any @author tags.

    -1 tests included. The patch doesn't appear to include any new or modified tests.
    Please justify why no tests are needed for this patch.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    -1 contrib tests. The patch failed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/console

    This message is automatically generated.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Feb 8, 2009 at 6:15 am
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671561#action_12671561 ]

    Hemanth Yamijala commented on HADOOP-4490:
    ------------------------------------------

    The failed contrib test is in Chukwa and is the same as HADOOP-5172.
    Map and Reduce tasks should run as the user who submitted the job
    -----------------------------------------------------------------

    Key: HADOOP-4490
    URL: https://issues.apache.org/jira/browse/HADOOP-4490
    Project: Hadoop Core
    Issue Type: Sub-task
    Components: mapred, security
    Reporter: Arun C Murthy
    Assignee: Hemanth Yamijala
    Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch


    Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
    For security and accounting purposes the tasks should be run as the job-owner.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 22, '08 at 8:26p
activeJun 8, '09 at 9:22p
posts94
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Nigel Daley (JIRA): 94 posts

People

Translate

site design / logo © 2022 Grokbase