FAQ
After trying out Hadoop in a single machine, I decided to run a MapReduce
across multiple machines. This is the approach I followed:
1 Master
1 Slave

(A doubt here: Can my Master also be used to execute the Map/Reduce
functions?)

To do this, I set up the masters and slaves files in the conf directory.
Following the instructions in this page -
http://hadoop.apache.org/core/docs/current/cluster_setup.html, I had set up
sshd in both the machines, and was able to ssh from one to the other.

I tried to run bin/start-dfs.sh. Unfortunately, this asked for a password
for user1@slave, while in slave, there was only user2. While in master,
user1 was the logged on user. How do I resolve this? Should the user
accounts be present in all the machines? Or can I specify this somewhere?

Search Discussions

  • Harish Mallipeddi at Apr 23, 2008 at 7:14 am

    On Wed, Apr 23, 2008 at 3:03 PM, Sridhar Raman wrote:

    After trying out Hadoop in a single machine, I decided to run a MapReduce
    across multiple machines. This is the approach I followed:
    1 Master
    1 Slave

    (A doubt here: Can my Master also be used to execute the Map/Reduce
    functions?)
    If you add the master node to the list of slaves (conf/slaves), then the
    master node run will also run a TaskTracker.

    To do this, I set up the masters and slaves files in the conf directory.
    Following the instructions in this page -
    http://hadoop.apache.org/core/docs/current/cluster_setup.html, I had set
    up
    sshd in both the machines, and was able to ssh from one to the other.

    I tried to run bin/start-dfs.sh. Unfortunately, this asked for a password
    for user1@slave, while in slave, there was only user2. While in master,
    user1 was the logged on user. How do I resolve this? Should the user
    accounts be present in all the machines? Or can I specify this somewhere?


    --
    Harish Mallipeddi
    circos.com : poundbang.in/blog/
  • Sridhar Raman at Apr 23, 2008 at 8:40 am
    Ok, what about the issue regarding the users? Do all the machines need to
    be under the same user?
    On Wed, Apr 23, 2008 at 12:43 PM, Harish Mallipeddi wrote:
    On Wed, Apr 23, 2008 at 3:03 PM, Sridhar Raman wrote:

    After trying out Hadoop in a single machine, I decided to run a MapReduce
    across multiple machines. This is the approach I followed:
    1 Master
    1 Slave

    (A doubt here: Can my Master also be used to execute the Map/Reduce
    functions?)
    If you add the master node to the list of slaves (conf/slaves), then the
    master node run will also run a TaskTracker.

    To do this, I set up the masters and slaves files in the conf directory.
    Following the instructions in this page -
    http://hadoop.apache.org/core/docs/current/cluster_setup.html, I had set
    up
    sshd in both the machines, and was able to ssh from one to the other.

    I tried to run bin/start-dfs.sh. Unfortunately, this asked for a password
    for user1@slave, while in slave, there was only user2. While in master,
    user1 was the logged on user. How do I resolve this? Should the user
    accounts be present in all the machines? Or can I specify this
    somewhere?


    --
    Harish Mallipeddi
    circos.com : poundbang.in/blog/
  • Norbert Burger at Apr 23, 2008 at 3:44 pm
    Yes, this is the suggested configuration. Hadoop relies on password-less
    SSH to be able to start tasks on slave machines. You can find instructions
    on creating/transferring the SSH keys here:

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
    On Wed, Apr 23, 2008 at 4:39 AM, Sridhar Raman wrote:

    Ok, what about the issue regarding the users? Do all the machines need to
    be under the same user?

    On Wed, Apr 23, 2008 at 12:43 PM, Harish Mallipeddi <
    harish.mallipeddi@gmail.com> wrote:
    On Wed, Apr 23, 2008 at 3:03 PM, Sridhar Raman <sridhar.raman@gmail.com>
    wrote:
    After trying out Hadoop in a single machine, I decided to run a MapReduce
    across multiple machines. This is the approach I followed:
    1 Master
    1 Slave

    (A doubt here: Can my Master also be used to execute the Map/Reduce
    functions?)
    If you add the master node to the list of slaves (conf/slaves), then the
    master node run will also run a TaskTracker.

    To do this, I set up the masters and slaves files in the conf
    directory.
    Following the instructions in this page -
    http://hadoop.apache.org/core/docs/current/cluster_setup.html, I had
    set
    up
    sshd in both the machines, and was able to ssh from one to the other.

    I tried to run bin/start-dfs.sh. Unfortunately, this asked for a password
    for user1@slave, while in slave, there was only user2. While in
    master,
    user1 was the logged on user. How do I resolve this? Should the user
    accounts be present in all the machines? Or can I specify this
    somewhere?


    --
    Harish Mallipeddi
    circos.com : poundbang.in/blog/
  • Sridhar Raman at Apr 24, 2008 at 9:37 am
    I tried following the instructions for a single-node cluster (as mentioned
    in the link). I am facing a strange roadblock.

    In the hadoop-site.xml, I have set the value of hadoop.tmp.dir to
    /WORK/temp/hadoop/workspace/hadoop-${user.name}.

    After doing this, I run bin/hadoop namenode -format, and this now creates a
    hadoop-sridhar folder at under workspace. This is fine as the user I've
    logged on as is "sridhar".

    Then I start my cluster by running bin/start-all.sh. When I do this, the
    output I get is as mentioned in this
    link<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29#hadoop-site.xml>,
    but a new folder called hadoop-SYSTEM is created under workspace.

    And then when I run bin/stop-all.sh, all that I get is a "no tasktracker to
    stop, no datanode to stop, ...". Any idea why this can happen?

    Another point is that after starting the cluster, I did a netstat. I found
    multiple entries of localhost:9000 and all had LISTENING. Is this also
    expected behaviour?
    On Wed, Apr 23, 2008 at 9:13 PM, Norbert Burger wrote:

    Yes, this is the suggested configuration. Hadoop relies on password-less
    SSH to be able to start tasks on slave machines. You can find
    instructions
    on creating/transferring the SSH keys here:


    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
    On Wed, Apr 23, 2008 at 4:39 AM, Sridhar Raman wrote:

    Ok, what about the issue regarding the users? Do all the machines need to
    be under the same user?

    On Wed, Apr 23, 2008 at 12:43 PM, Harish Mallipeddi <
    harish.mallipeddi@gmail.com> wrote:
    On Wed, Apr 23, 2008 at 3:03 PM, Sridhar Raman <
    sridhar.raman@gmail.com>
    wrote:
    After trying out Hadoop in a single machine, I decided to run a MapReduce
    across multiple machines. This is the approach I followed:
    1 Master
    1 Slave

    (A doubt here: Can my Master also be used to execute the Map/Reduce
    functions?)
    If you add the master node to the list of slaves (conf/slaves), then
    the
    master node run will also run a TaskTracker.

    To do this, I set up the masters and slaves files in the conf
    directory.
    Following the instructions in this page -
    http://hadoop.apache.org/core/docs/current/cluster_setup.html, I had
    set
    up
    sshd in both the machines, and was able to ssh from one to the
    other.
    I tried to run bin/start-dfs.sh. Unfortunately, this asked for a password
    for user1@slave, while in slave, there was only user2. While in
    master,
    user1 was the logged on user. How do I resolve this? Should the
    user
    accounts be present in all the machines? Or can I specify this
    somewhere?


    --
    Harish Mallipeddi
    circos.com : poundbang.in/blog/
  • Sridhar Raman at May 1, 2008 at 12:51 pm
    Though I am able to run MapReduce tasks without errors, I am still not able
    to get stop-all to work. It still says, "no tasktracker to stop, no
    datanode to stop, ...".

    And also, there are a lot of java processes running in my Task Manager which
    I need to forcibly shut down. Are these two problems related?
    On Thu, Apr 24, 2008 at 3:06 PM, Sridhar Raman wrote:

    I tried following the instructions for a single-node cluster (as mentioned
    in the link). I am facing a strange roadblock.

    In the hadoop-site.xml, I have set the value of hadoop.tmp.dir to
    /WORK/temp/hadoop/workspace/hadoop-${user.name}.

    After doing this, I run bin/hadoop namenode -format, and this now creates
    a hadoop-sridhar folder at under workspace. This is fine as the user I've
    logged on as is "sridhar".

    Then I start my cluster by running bin/start-all.sh. When I do this, the
    output I get is as mentioned in this link<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29#hadoop-site.xml>,
    but a new folder called hadoop-SYSTEM is created under workspace.

    And then when I run bin/stop-all.sh, all that I get is a "no tasktracker
    to stop, no datanode to stop, ...". Any idea why this can happen?

    Another point is that after starting the cluster, I did a netstat. I
    found multiple entries of localhost:9000 and all had LISTENING. Is this
    also expected behaviour?

    On Wed, Apr 23, 2008 at 9:13 PM, Norbert Burger wrote:

    Yes, this is the suggested configuration. Hadoop relies on
    password-less
    SSH to be able to start tasks on slave machines. You can find
    instructions
    on creating/transferring the SSH keys here:


    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    On Wed, Apr 23, 2008 at 4:39 AM, Sridhar Raman <sridhar.raman@gmail.com>
    wrote:
    Ok, what about the issue regarding the users? Do all the machines need to
    be under the same user?

    On Wed, Apr 23, 2008 at 12:43 PM, Harish Mallipeddi <
    harish.mallipeddi@gmail.com> wrote:
    On Wed, Apr 23, 2008 at 3:03 PM, Sridhar Raman <
    sridhar.raman@gmail.com>
    wrote:
    After trying out Hadoop in a single machine, I decided to run a MapReduce
    across multiple machines. This is the approach I followed:
    1 Master
    1 Slave

    (A doubt here: Can my Master also be used to execute the
    Map/Reduce
    functions?)
    If you add the master node to the list of slaves (conf/slaves), then
    the
    master node run will also run a TaskTracker.

    To do this, I set up the masters and slaves files in the conf
    directory.
    Following the instructions in this page -
    http://hadoop.apache.org/core/docs/current/cluster_setup.html, I
    had
    set
    up
    sshd in both the machines, and was able to ssh from one to the
    other.
    I tried to run bin/start-dfs.sh. Unfortunately, this asked for a password
    for user1@slave, while in slave, there was only user2. While in
    master,
    user1 was the logged on user. How do I resolve this? Should the
    user
    accounts be present in all the machines? Or can I specify this
    somewhere?


    --
    Harish Mallipeddi
    circos.com : poundbang.in/blog/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 23, '08 at 7:04a
activeMay 1, '08 at 12:51p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase