FAQ
Hi All,

I have CDHu4 installed in my CentOS Linux cluster using Cloudera Manager
Free Edition. I am trying to understand that what are the various ways I
can set user level and group level permissions for read/write in HDFS.
I have set 770 permission to the "/user/ HDFS directory which has the
owner:group = hdfs:supergroup.

My intension is now to add all those users in "supergroup" whom I want to
access the "/user" directory. I thought that "hadoop" is the supergroup and
hence added another user "gaurav" to "hadoop". But I can see that only
"hdfs" can now access "/user" HDFS directory.

I tried adding "gaurav" to "hdfs" group also. But it didn't work.
What should I do so that "gaurav" can access "/user" directory with
permission level 770 and owned by "hdfs:supergroup"? In other words, what
is "supergroup"? How can I add users to "supergroup"? How can I manage it
using Cloudera Manager?

Note: I could not find supergroup in "/etc/group"

Thanks,
Gaurav

Search Discussions

  • Philip Zeyliger at Feb 14, 2013 at 5:12 pm
    Hi Gaurav,

    Hadoop delegates the resolution of user to group mappings to something
    else. This something else is configurable, but is almost always the unix
    "groups" command. (See
    http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0/api/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.html,
    for example, though that happens to be from cdh3u0, which is old.)

    Separately, the group name that's the "supergroup" is configurable. In CM
    you can change it to some other group.

    So, to let's say we have alice, bob, and charlie, and you want alice and
    bob to be administrators.

    You can either put both alice and bob in a group called "fancypants" in
    Unix (most importantly on the namenode machine, but you should do it
    everywhere), and then change the "supergroup" configuration property to
    "fancypants." Or, you can create the Unix group called "supergroup" and
    put alice and bob in that one. Either of these approaches should work.

    Cheers,

    -- Philip


    On Thu, Feb 14, 2013 at 6:55 AM, Gaurav Dasgupta wrote:

    Hi All,

    I have CDHu4 installed in my CentOS Linux cluster using Cloudera Manager
    Free Edition. I am trying to understand that what are the various ways I
    can set user level and group level permissions for read/write in HDFS.
    I have set 770 permission to the "/user/ HDFS directory which has the
    owner:group = hdfs:supergroup.

    My intension is now to add all those users in "supergroup" whom I want to
    access the "/user" directory. I thought that "hadoop" is the supergroup and
    hence added another user "gaurav" to "hadoop". But I can see that only
    "hdfs" can now access "/user" HDFS directory.

    I tried adding "gaurav" to "hdfs" group also. But it didn't work.
    What should I do so that "gaurav" can access "/user" directory with
    permission level 770 and owned by "hdfs:supergroup"? In other words, what
    is "supergroup"? How can I add users to "supergroup"? How can I manage it
    using Cloudera Manager?

    Note: I could not find supergroup in "/etc/group"

    Thanks,
    Gaurav
  • Gaurav Dasgupta at Feb 15, 2013 at 5:19 am
    Thanks a lot Philip. Got it now.
    On Thu, Feb 14, 2013 at 10:41 PM, Philip Zeyliger wrote:

    Hi Gaurav,

    Hadoop delegates the resolution of user to group mappings to something
    else. This something else is configurable, but is almost always the unix
    "groups" command. (See
    http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u0/api/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.html,
    for example, though that happens to be from cdh3u0, which is old.)

    Separately, the group name that's the "supergroup" is configurable. In CM
    you can change it to some other group.

    So, to let's say we have alice, bob, and charlie, and you want alice and
    bob to be administrators.

    You can either put both alice and bob in a group called "fancypants" in
    Unix (most importantly on the namenode machine, but you should do it
    everywhere), and then change the "supergroup" configuration property to
    "fancypants." Or, you can create the Unix group called "supergroup" and
    put alice and bob in that one. Either of these approaches should work.

    Cheers,

    -- Philip


    On Thu, Feb 14, 2013 at 6:55 AM, Gaurav Dasgupta wrote:

    Hi All,

    I have CDHu4 installed in my CentOS Linux cluster using Cloudera Manager
    Free Edition. I am trying to understand that what are the various ways I
    can set user level and group level permissions for read/write in HDFS.
    I have set 770 permission to the "/user/ HDFS directory which has the
    owner:group = hdfs:supergroup.

    My intension is now to add all those users in "supergroup" whom I want to
    access the "/user" directory. I thought that "hadoop" is the supergroup and
    hence added another user "gaurav" to "hadoop". But I can see that only
    "hdfs" can now access "/user" HDFS directory.

    I tried adding "gaurav" to "hdfs" group also. But it didn't work.
    What should I do so that "gaurav" can access "/user" directory with
    permission level 770 and owned by "hdfs:supergroup"? In other words, what
    is "supergroup"? How can I add users to "supergroup"? How can I manage it
    using Cloudera Manager?

    Note: I could not find supergroup in "/etc/group"

    Thanks,
    Gaurav
  • Gaurav Dasgupta at Feb 15, 2013 at 8:33 am
    I have one more doubt. So, the user needs to be there in the group in the
    namenode in order to access the HDFS directory which has permission only
    for that group.
    But if the namenode doesn't have that user and instead, some slave node
    only has that user, still I can submit jobs using that user from the slave
    node. Only, I have to make the user's directory under HDFS's "/user"
    directory and set the right owners and permissions.

    My question is when the user is not present in namenode, still it can
    submit job from a slave node (where the user exists), that means the
    namenode can read that user. But then why if the user is not in the
    permitted group in the namenode, but in some other slave node, namenode
    cannot read that user (In other words the user cannot access that HDFS
    directory) ?

    I might sound a bit stupid here, but I am really confused how users and
    groups are understood by namenode in a cluster?

    Thanks,
    Gaurav
  • Harsh J at Feb 15, 2013 at 9:49 am
    Hi Gaurav,

    "Users" in NN -> Received via the RPC call, carried from the client
    (authenticated principal or simple username). A list is not stored
    anywhere, processed as it arrives.
    "Groups" in NN -> Received via a plugin (shell "groups" command is
    default) thats passed the received username. Stored in an expiring
    cache.

    If you think about it, it makes sense to receive usernames over RPC
    (since there can be an arbitrary number of users). Groups however,
    need a singular view and wouldn't be reliable if we plainly accepted
    what a client sends us (since there's no way to "authenticate" groups
    that way). Without reliable groups, its hard to cover permissions and
    ACL management.
    On Fri, Feb 15, 2013 at 2:03 PM, Gaurav Dasgupta wrote:
    I have one more doubt. So, the user needs to be there in the group in the
    namenode in order to access the HDFS directory which has permission only for
    that group.
    But if the namenode doesn't have that user and instead, some slave node only
    has that user, still I can submit jobs using that user from the slave node.
    Only, I have to make the user's directory under HDFS's "/user" directory and
    set the right owners and permissions.

    My question is when the user is not present in namenode, still it can submit
    job from a slave node (where the user exists), that means the namenode can
    read that user. But then why if the user is not in the permitted group in
    the namenode, but in some other slave node, namenode cannot read that user
    (In other words the user cannot access that HDFS directory) ?

    I might sound a bit stupid here, but I am really confused how users and
    groups are understood by namenode in a cluster?

    Thanks,
    Gaurav


    --
    Harsh J
  • Gaurav Dasgupta at Feb 15, 2013 at 10:11 am
    Thanks Harsh for the reply.

    So, this means that if I want to create, control and manage groups from an
    edge/client node for Hadoop access, it is not possible?

    I have got only one HDFS directory to be used which has owner:group say
    hdfs:hadoop and permission set to 770. And at the same time I have access
    to only one edge node in the cluster and not the NN. So, if I create a new
    user say "abc" and add it to "hadoop" group of that edge node, still "abc"
    cannot use the desired HDFS directory?

    Isn't there a possible solution to this situation?

    Thanks,
    Gaurav
  • Harsh J at Feb 15, 2013 at 12:40 pm
    The groups lookup plugin is configurable and accepts custom class
    implementations. We also support LDAP for groups lookup via an inbuilt
    plugin configuration.
    On Feb 15, 2013 3:41 PM, "Gaurav Dasgupta" wrote:

    Thanks Harsh for the reply.

    So, this means that if I want to create, control and manage groups from an
    edge/client node for Hadoop access, it is not possible?

    I have got only one HDFS directory to be used which has owner:group say
    hdfs:hadoop and permission set to 770. And at the same time I have access
    to only one edge node in the cluster and not the NN. So, if I create a new
    user say "abc" and add it to "hadoop" group of that edge node, still "abc"
    cannot use the desired HDFS directory?

    Isn't there a possible solution to this situation?

    Thanks,
    Gaurav

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedFeb 14, '13 at 3:01p
activeFeb 15, '13 at 12:40p
posts7
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase