FAQ
Folks,

Does anyone know if this earlier post ever reached a resolution? I am trying to work through the same tutorial, and I have encountered the same issue. Of the candidate problems Jason suggested, none of them seem to pan out in my case (details below). I'm looking for suggestions as to how I can get this to work or other avenues to try to debug. I am using hadoop-0.20.2 on Red Hat Enterprise Linux Server release 5.4.

Thanks!

Sincerely,
David Kane

None of my logs are reporting any errors.

Candidate Issue: Either your master namenode/jobtrackers are not actually starting:
JPS Shows the following on the Master Node:
32559 DataNode
398 TaskTracker
32749 JobTracker <----- Master Job Tracker started
32414 NameNode <----- Master NameNode started
32668 SecondaryNameNode
439 Jps

BTW, the master does seem to be starting up the processes correctly on the slave. JPS there reports:
4048 DataNode
4179 Jps
4108 TaskTracker

Candidate Issue: they [master namenode/jobtrackers] are not listening on those particular ports
On my master, my namenode log shows:
2010-03-17 09:08:11,711 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2010-03-17 09:08:11,712 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting
....
2010-03-17 09:08:11,752 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: starting

On my master, my jobtracker log shows:
2010-03-17 09:09:31,036 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 54311
2010-03-17 09:09:31,036 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
2010-03-17 09:09:31,308 INFO org.apache.hadoop.mapred.JobTracker: Cleaning up the system directory
2010-03-17 09:09:31,369 INFO org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store is inactive
2010-03-17 09:09:31,500 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2010-03-17 09:09:31,501 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54311: starting
2010-03-17 09:09:31,507 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54311: starting
2010-03-17 09:09:31,509 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311: starting
2010-03-17 09:09:31,511 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
.....
2010-03-17 09:09:31,523 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54311: starting

On my slave, my namenode log shows:
2010-03-17 09:25:56,217 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54310. Already tried 0 time(s).
2010-03-17 09:25:57,231 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54310. Already tried 1 time(s).
...
2010-03-17 09:26:05,364 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54310. Already tried 9 time(s).
2010-03-17 09:26:05,365 INFO org.apache.hadoop.ipc.RPC: Server at mdadqsgdac1.mdanderson.edu/10.111.85.15:54310 not available yet, Zzzzz...

On my slave, my namenode log shows:
2010-03-17 09:26:00,850 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54311. Already tried 0 time(s).
2010-03-17 09:26:01,869 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54311. Already tried 1 time(s).
...
2010-03-17 09:26:10,002 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54311. Already tried 9 time(s).
2010-03-17 09:26:10,003 INFO org.apache.hadoop.ipc.RPC: Server at mdadqsgdac1.mdanderson.edu/10.111.85.15:54311 not available yet, Zzzzz...

The domain names and IP numbers that the slave is using do appear to match the ones that the master's logs reports it is using. While the jobtracker has an explicit Starting RUNNING message and the namenode does not, the messages on the slave side are the same.

Candidate Issue: There is a networking issue

What sort of issue would cause this problem. I don't seem to have any issues getting from one machine to the other. I can ssh in both directions. I can traceroute from slave to master:
-sh-3.2$ traceroute mdadqsgdac1.mdanderson.edu
traceroute to mdadqsgdac1.mdanderson.edu (10.111.85.15), 30 hops max, 40 byte packets
1 mdadqsgdac1.mdanderson.edu (10.111.85.15) 0.080 ms 0.089 ms 0.084 ms
and I can traceroute from master to slave:
-sh-3.2$ traceroute mdadqsgdac2.mdanderson.edu
traceroute to mdadqsgdac2.mdanderson.edu (10.111.85.16), 30 hops max, 40 byte packets
1 mdadqsgdac2.mdanderson.edu (10.111.85.16) 0.142 ms 0.144 ms 0.136 ms




-------Jason Venner <jason.had...@gmail.com> wrote on Thu, 05 Nov 2009 11:15:54 GMT----------------------------------------------------------------

Either your master namenode/jobtrackers are not actually starting, or they
are not listening on those particular ports or there is a networking issue.
On Tue, Nov 3, 2009 at 4:23 AM, Neil Blue wrote:

Hello

I am trying to start up my first twin node hadoop cluster. I have followed
this guide:

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Nod
e_Cluster%29<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Nod%0Ae_Cluster%29>
,and got two machines running as single node instances and then moved on to
connect them into a multi-node cluster.

I have two ubuntu instances running in virtual box with a bridged network
adapter.

I have configured the xml files slaves and master to point to the correct
machines, along with the ssh key.

When I start up the services I get all these starting on the master:

JobTracker
DataNode
SecondaryNameNode
TaskTracker
NameNode

The web interface shows the system is up and running with one node.

On the slave these are running:
TaskTracker
DataNode

The output logs on the slave show:

hadoop-hadoop-datanode-slave.log
2009-11-03 11:15:52,055 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:4310. Already tried 9 time(s).
2009-11-03 11:15:52,057 INFO org.apache.hadoop.ipc.RPC: Server at
master/172.18.11.95:4310 not available yet, Zzzzz...
2009-11-03 11:15:54,063 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:4310. Already tried 0 time(s).
2009-11-03 11:15:55,064 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:4310. Already tried 1 time(s).
2009-11-03 11:15:56,068 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:4310. Already tried 2 time(s).
2009-11-03 11:15:57,073 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:4310. Already tried 3 time(s).

hadoop-hadoop-tasktracker-slave.log
2009-11-03 11:18:01,002 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:9001. Already tried 9 time(s).
2009-11-03 11:18:01,004 INFO org.apache.hadoop.ipc.RPC: Server at
master/172.18.11.95:9001 not available yet, Zzzzz...
2009-11-03 11:18:03,007 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:9001. Already tried 0 time(s).
2009-11-03 11:18:04,009 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:9001. Already tried 1 time(s).
2009-11-03 11:18:05,011 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/172.18.11.95:9001. Already tried 2 time(s).

Tcpdump shows that the packets are being sent between the machines, and ssh
works, so there does not seem to be any network problems. Also on the
slave,
the remote http://master:50070/dfshealth.jsp page is visible.

I have also tried changing the port numbers used by the master, but no
luck.

Any suggestions please.

Thanks
Neil

*********************************************

Search Discussions

  • Kane, David at Mar 18, 2010 at 2:30 pm
    Folks,

    As it turns out, it was a networking problem. The solution was similar to what described here:

    http://wiki.apache.org/hadoop/Hbase/Troubleshooting

    However, I needed to make a similar adjustment to the master's /etc/hosts file as well.

    Sincerely,
    David Kane

    -----Original Message-----
    From: Kane, David
    Sent: Wed 3/17/2010 10:46 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Slave data node failing to connect?

    Folks,

    Does anyone know if this earlier post ever reached a resolution? I am trying to work through the same tutorial, and I have encountered the same issue. Of the candidate problems Jason suggested, none of them seem to pan out in my case (details below). I'm looking for suggestions as to how I can get this to work or other avenues to try to debug. I am using hadoop-0.20.2 on Red Hat Enterprise Linux Server release 5.4.

    Thanks!

    Sincerely,
    David Kane

    None of my logs are reporting any errors.

    Candidate Issue: Either your master namenode/jobtrackers are not actually starting:
    JPS Shows the following on the Master Node:
    32559 DataNode
    398 TaskTracker
    32749 JobTracker <----- Master Job Tracker started
    32414 NameNode <----- Master NameNode started
    32668 SecondaryNameNode
    439 Jps

    BTW, the master does seem to be starting up the processes correctly on the slave. JPS there reports:
    4048 DataNode
    4179 Jps
    4108 TaskTracker

    Candidate Issue: they [master namenode/jobtrackers] are not listening on those particular ports
    On my master, my namenode log shows:
    2010-03-17 09:08:11,711 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
    2010-03-17 09:08:11,712 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting
    ....
    2010-03-17 09:08:11,752 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: starting

    On my master, my jobtracker log shows:
    2010-03-17 09:09:31,036 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 54311
    2010-03-17 09:09:31,036 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
    2010-03-17 09:09:31,308 INFO org.apache.hadoop.mapred.JobTracker: Cleaning up the system directory
    2010-03-17 09:09:31,369 INFO org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store is inactive
    2010-03-17 09:09:31,500 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
    2010-03-17 09:09:31,501 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54311: starting
    2010-03-17 09:09:31,507 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54311: starting
    2010-03-17 09:09:31,509 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311: starting
    2010-03-17 09:09:31,511 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
    .....
    2010-03-17 09:09:31,523 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54311: starting

    On my slave, my namenode log shows:
    2010-03-17 09:25:56,217 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54310. Already tried 0 time(s).
    2010-03-17 09:25:57,231 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54310. Already tried 1 time(s).
    ...
    2010-03-17 09:26:05,364 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54310. Already tried 9 time(s).
    2010-03-17 09:26:05,365 INFO org.apache.hadoop.ipc.RPC: Server at mdadqsgdac1.mdanderson.edu/10.111.85.15:54310 not available yet, Zzzzz...

    On my slave, my namenode log shows:
    2010-03-17 09:26:00,850 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54311. Already tried 0 time(s).
    2010-03-17 09:26:01,869 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54311. Already tried 1 time(s).
    ...
    2010-03-17 09:26:10,002 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mdadqsgdac1.mdanderson.edu/10.111.85.15:54311. Already tried 9 time(s).
    2010-03-17 09:26:10,003 INFO org.apache.hadoop.ipc.RPC: Server at mdadqsgdac1.mdanderson.edu/10.111.85.15:54311 not available yet, Zzzzz...

    The domain names and IP numbers that the slave is using do appear to match the ones that the master's logs reports it is using. While the jobtracker has an explicit Starting RUNNING message and the namenode does not, the messages on the slave side are the same.

    Candidate Issue: There is a networking issue

    What sort of issue would cause this problem. I don't seem to have any issues getting from one machine to the other. I can ssh in both directions. I can traceroute from slave to master:
    -sh-3.2$ traceroute mdadqsgdac1.mdanderson.edu
    traceroute to mdadqsgdac1.mdanderson.edu (10.111.85.15), 30 hops max, 40 byte packets
    1 mdadqsgdac1.mdanderson.edu (10.111.85.15) 0.080 ms 0.089 ms 0.084 ms
    and I can traceroute from master to slave:
    -sh-3.2$ traceroute mdadqsgdac2.mdanderson.edu
    traceroute to mdadqsgdac2.mdanderson.edu (10.111.85.16), 30 hops max, 40 byte packets
    1 mdadqsgdac2.mdanderson.edu (10.111.85.16) 0.142 ms 0.144 ms 0.136 ms




    -------Jason Venner <jason.had...@gmail.com> wrote on Thu, 05 Nov 2009 11:15:54 GMT----------------------------------------------------------------

    Either your master namenode/jobtrackers are not actually starting, or they
    are not listening on those particular ports or there is a networking issue.
    On Tue, Nov 3, 2009 at 4:23 AM, Neil Blue wrote:

    Hello

    I am trying to start up my first twin node hadoop cluster. I have followed
    this guide:

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Nod
    e_Cluster%29<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Nod%0Ae_Cluster%29>
    ,and got two machines running as single node instances and then moved on to
    connect them into a multi-node cluster.

    I have two ubuntu instances running in virtual box with a bridged network
    adapter.

    I have configured the xml files slaves and master to point to the correct
    machines, along with the ssh key.

    When I start up the services I get all these starting on the master:

    JobTracker
    DataNode
    SecondaryNameNode
    TaskTracker
    NameNode

    The web interface shows the system is up and running with one node.

    On the slave these are running:
    TaskTracker
    DataNode

    The output logs on the slave show:

    hadoop-hadoop-datanode-slave.log
    2009-11-03 11:15:52,055 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:4310. Already tried 9 time(s).
    2009-11-03 11:15:52,057 INFO org.apache.hadoop.ipc.RPC: Server at
    master/172.18.11.95:4310 not available yet, Zzzzz...
    2009-11-03 11:15:54,063 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:4310. Already tried 0 time(s).
    2009-11-03 11:15:55,064 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:4310. Already tried 1 time(s).
    2009-11-03 11:15:56,068 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:4310. Already tried 2 time(s).
    2009-11-03 11:15:57,073 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:4310. Already tried 3 time(s).

    hadoop-hadoop-tasktracker-slave.log
    2009-11-03 11:18:01,002 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:9001. Already tried 9 time(s).
    2009-11-03 11:18:01,004 INFO org.apache.hadoop.ipc.RPC: Server at
    master/172.18.11.95:9001 not available yet, Zzzzz...
    2009-11-03 11:18:03,007 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:9001. Already tried 0 time(s).
    2009-11-03 11:18:04,009 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:9001. Already tried 1 time(s).
    2009-11-03 11:18:05,011 INFO org.apache.hadoop.ipc.Client: Retrying connect
    to server: master/172.18.11.95:9001. Already tried 2 time(s).

    Tcpdump shows that the packets are being sent between the machines, and ssh
    works, so there does not seem to be any network problems. Also on the
    slave,
    the remote http://master:50070/dfshealth.jsp page is visible.

    I have also tried changing the port numbers used by the master, but no
    luck.

    Any suggestions please.

    Thanks
    Neil

    *********************************************

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 17, '10 at 2:47p
activeMar 18, '10 at 2:30p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Kane, David: 2 posts

People

Translate

site design / logo © 2022 Grokbase