FAQ
Dear All,

I am trying to setup Hadoop for multiple users in a class, on our cluster. For some reason I don't seem to get it right. If only one user is running it works great.
I would want to have all of the users submit a Hadoop job to the existing DataNode and on the cluster, not sure if this is right.
Do I need to start a DataNode for every user, if so I was not able to do because I ran into issues of port already being used.
Please advise. Below are few of the config files.

Also I have tired searching for other documents, that tell us to create a user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop user. This didn't work for me as well. I am sure I am doing something wrong. Could anyone please thrown in some more ideas.

=>List of env changed in Hadoop-env.sh:
export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids

#cat core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://frontend:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/scratch/${user.name}/hadoop-FS</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>

# cat hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
</property>
</configuration>

# cat mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>frontend:9001</value>
</property>
<property>
<name>mapreduce.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapreduce.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
</configuration>


Thank you,
Amit

Search Discussions

  • Li ping at Feb 10, 2011 at 2:00 am
    If can check this property in hdfs-site.xml

    <property>
    <name>dfs.permissions</name>
    <value>true</value>
    <description>
    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the
    mode,
    owner or group of files or directories.
    </description>
    </property>

    You can disable this option.

    the second way is:
    running the command in hadoop. hadoop fs -chmod o+w /
    It has the same effect with first one
    On Thu, Feb 10, 2011 at 3:12 AM, Kumar, Amit H. wrote:

    Dear All,

    I am trying to setup Hadoop for multiple users in a class, on our cluster.
    For some reason I don't seem to get it right. If only one user is running it
    works great.
    I would want to have all of the users submit a Hadoop job to the existing
    DataNode and on the cluster, not sure if this is right.
    Do I need to start a DataNode for every user, if so I was not able to do
    because I ran into issues of port already being used.
    Please advise. Below are few of the config files.

    Also I have tired searching for other documents, that tell us to create a
    user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop
    user. This didn't work for me as well. I am sure I am doing something
    wrong. Could anyone please thrown in some more ideas.

    =>List of env changed in Hadoop-env.sh:
    export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
    export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids

    #cat core-site.xml
    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://frontend:9000</value>
    </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/scratch/${user.name}/hadoop-FS</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    # cat hdfs-site.xml
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.name.dir</name>
    <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
    </property>
    </configuration>

    # cat mapred-site.xml
    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>frontend:9001</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.map.tasks.maximum</name>
    <value>2</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>2</value>
    </property>
    </configuration>


    Thank you,
    Amit


    --
    -----李平
  • Kumar, Amit H. at Feb 10, 2011 at 5:46 pm
    Li Ping: Disabling dfs.permissions did the charm!.

    I have the following questions, if you can help me understand this better:
    1. Not sure what are the consequences of disabling it or even doing chmod o+w on the entire filesyste(/).
    2. Is there any need to have the permissions in place, other than securing users from each other's work.
    3. Is it still possible to have the hdfs permissions enabled and yet be able to run multiple user submitting jobs to a common pool of resources.

    Thank you so much for your help!
    Amit

    -----Original Message-----
    From: li ping
    Sent: Wednesday, February 09, 2011 9:00 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop Multi user - Cluster Setup

    If can check this property in hdfs-site.xml

    <property>
    <name>dfs.permissions</name>
    <value>true</value>
    <description>
    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the
    mode,
    owner or group of files or directories.
    </description>
    </property>

    You can disable this option.

    the second way is:
    running the command in hadoop. hadoop fs -chmod o+w /
    It has the same effect with first one
    On Thu, Feb 10, 2011 at 3:12 AM, Kumar, Amit H. wrote:

    Dear All,

    I am trying to setup Hadoop for multiple users in a class, on our cluster.
    For some reason I don't seem to get it right. If only one user is
    running it
    works great.
    I would want to have all of the users submit a Hadoop job to the existing
    DataNode and on the cluster, not sure if this is right.
    Do I need to start a DataNode for every user, if so I was not able to do
    because I ran into issues of port already being used.
    Please advise. Below are few of the config files.

    Also I have tired searching for other documents, that tell us to create a
    user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop
    user. This didn't work for me as well. I am sure I am doing something
    wrong. Could anyone please thrown in some more ideas.

    =>List of env changed in Hadoop-env.sh:
    export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
    export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids

    #cat core-site.xml
    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://frontend:9000</value>
    </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/scratch/${user.name}/hadoop-FS</value>
    <description>A base for other temporary
    directories.</description>
    </property>
    </configuration>

    # cat hdfs-site.xml
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.name.dir</name>
    <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
    </property>
    </configuration>

    # cat mapred-site.xml
    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>frontend:9001</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.map.tasks.maximum</name>
    <value>2</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>2</value>
    </property>
    </configuration>


    Thank you,
    Amit


    --
    -----李平


    --
    BEGIN-ANTISPAM-VOTING-LINKS
    ------------------------------------------------------

    Teach CanIt if this mail (ID 444122709) is spam:
    Spam:
    https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
    9&c=s
    Not spam:
    https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
    9&c=n
    Forget vote:
    https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
    9&c=f
    ------------------------------------------------------
    END-ANTISPAM-VOTING-LINKS
  • Harsh J at Feb 10, 2011 at 6:02 pm
    Please read the HDFS Permissions guide which explains the
    understanding required to have a working permissions model on the DFS:
    http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html
    On Thu, Feb 10, 2011 at 11:15 PM, Kumar, Amit H. wrote:
    Li Ping: Disabling dfs.permissions did the charm!.

    I have the following questions, if you can help me understand this better:
    1. Not sure what are the consequences of disabling it or even doing chmod o+w on the entire filesyste(/).
    2. Is there any need to have the permissions in place, other than securing users from each other's work.
    3. Is it still possible to have the hdfs permissions enabled and yet be able to run multiple user submitting jobs to a common pool of resources.

    Thank you so much for your help!
    Amit

    -----Original Message-----
    From: li ping
    Sent: Wednesday, February 09, 2011 9:00 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Hadoop Multi user - Cluster Setup

    If can check this property in hdfs-site.xml

    <property>
    <name>dfs.permissions</name>
    <value>true</value>
    <description>
    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the
    mode,
    owner or group of files or directories.
    </description>
    </property>

    You can disable this option.

    the second way is:
    running the command in hadoop. hadoop fs -chmod o+w /
    It has the same effect with first one

    On Thu, Feb 10, 2011 at 3:12 AM, Kumar, Amit H. <AHKumar@odu.edu>
    wrote:
    Dear All,

    I am trying to setup Hadoop for multiple users in a class, on our cluster.
    For some reason I don't seem to get it right. If only one user is
    running it
    works great.
    I would want to have all of the users submit a Hadoop job to the existing
    DataNode and on the cluster, not sure if this is right.
    Do I need to start a DataNode for every user, if so I was not able to do
    because I ran into issues of port already being used.
    Please advise. Below are few of the config files.

    Also I have tired searching for other documents, that tell us to create a
    user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop
    user. This didn't work for me as well.  I am sure I am doing something
    wrong. Could anyone please thrown in some more ideas.

    =>List of env changed in Hadoop-env.sh:
    export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
    export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids

    #cat core-site.xml
    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://frontend:9000</value>
    </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/scratch/${user.name}/hadoop-FS</value>
    <description>A base for other temporary
    directories.</description>
    </property>
    </configuration>

    # cat hdfs-site.xml
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.name.dir</name>
    <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
    </property>
    </configuration>

    # cat mapred-site.xml
    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>frontend:9001</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.map.tasks.maximum</name>
    <value>2</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>2</value>
    </property>
    </configuration>


    Thank you,
    Amit


    --
    -----李平


    --
    BEGIN-ANTISPAM-VOTING-LINKS
    ------------------------------------------------------

    Teach CanIt if this mail (ID 444122709) is spam:
    Spam:
    https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
    9&c=s
    Not spam:
    https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
    9&c=n
    Forget vote:
    https://www.spamtrap.odu.edu/b.php?i=444122709&m=22ca10eb246a&t=2011020
    9&c=f
    ------------------------------------------------------
    END-ANTISPAM-VOTING-LINKS


    --
    Harsh J
    www.harshj.com
  • Piyush Joshi at Feb 10, 2011 at 1:22 pm
    Hey Amit, please try HOD or hadoop on demand tool. This will suffice to your
    need for creating multiple users on ur cluster.

    -Piyush
    On Thu, Feb 10, 2011 at 12:42 AM, Kumar, Amit H. wrote:

    Dear All,

    I am trying to setup Hadoop for multiple users in a class, on our cluster.
    For some reason I don't seem to get it right. If only one user is running it
    works great.
    I would want to have all of the users submit a Hadoop job to the existing
    DataNode and on the cluster, not sure if this is right.
    Do I need to start a DataNode for every user, if so I was not able to do
    because I ran into issues of port already being used.
    Please advise. Below are few of the config files.

    Also I have tired searching for other documents, that tell us to create a
    user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop
    user. This didn't work for me as well. I am sure I am doing something
    wrong. Could anyone please thrown in some more ideas.

    =>List of env changed in Hadoop-env.sh:
    export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
    export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids

    #cat core-site.xml
    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://frontend:9000</value>
    </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/scratch/${user.name}/hadoop-FS</value>
    <description>A base for other temporary directories.</description>
    </property>
    </configuration>

    # cat hdfs-site.xml
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.name.dir</name>
    <value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
    </property>
    </configuration>

    # cat mapred-site.xml
    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>frontend:9001</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.map.tasks.maximum</name>
    <value>2</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>2</value>
    </property>
    </configuration>


    Thank you,
    Amit

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 9, '11 at 7:13p
activeFeb 10, '11 at 6:02p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase