Dear All,
I am trying to setup Hadoop for multiple users in a class, on our cluster. For some reason I don't seem to get it right. If only one user is running it works great.
I would want to have all of the users submit a Hadoop job to the existing DataNode and on the cluster, not sure if this is right.
Do I need to start a DataNode for every user, if so I was not able to do because I ran into issues of port already being used.
Please advise. Below are few of the config files.
Also I have tired searching for other documents, that tell us to create a user "Hadoop" and a group "Hadoop" and then start the daemons as Hadoop user. This didn't work for me as well. I am sure I am doing something wrong. Could anyone please thrown in some more ideas.
=>List of env changed in Hadoop-env.sh:
export HADOOP_LOG_DIR=/scratch/$USER/hadoop-logs
export HADOOP_PID_DIR=/scratch/$USER/.var/hadoop/pids
#cat core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://frontend:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/scratch/${user.name}/hadoop-FS</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
# cat hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/scratch/${user.name}/.hadoop/.transaction/.edits</value>
</property>
</configuration>
# cat mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>frontend:9001</value>
</property>
<property>
<name>mapreduce.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapreduce.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
</configuration>
Thank you,
Amit