FAQ
Hello,

I have been working on setting up a Hadoop on Demand cluster on
three machines and have run into a bit of a snag. I went through the
admin and user guides and have successfully installed torque and HOD.
When I run "hod allocate" it successfully starts hodring on 2 of the
machines but not the third. The result is that I have a working
Namenode and Jobtracker (though its UI does not seem to work
presently) but no slave nodes.

Even at level 4 debug in all sections there is nothing to indicate
a failure as the ringmaster has no problem communicating with the
running hodring jobs and pbsdsh returns without error. I can find no
logs on any of the machines indicating a torque issue (though I admit
I am not terribly familiar with torque) and no logs at all for HOD on
the machine that is not running hodring.

It would appear that the pbsdsh job simply isn't starting hodring
on the one node given the lack of any HOD log on that machine. Either
it is not recognizing the node (seems somewhat unlikely as it comes up
in pbsnodes as free) or there is a relatively silent failure
somewhere. If you have any suggestions I would much appreciate them.

One quick side note is I have successfully run a standard
hadoop-0.20.0 cluster on these three machines with no difficulty,
which should rule out connection, ssh or firewall issues.

Thanks,

Seb

Search Discussions

  • Seb Seith at Aug 27, 2009 at 8:04 pm
    Hello,


    I have been working on setting up a Hadoop on Demand cluster on
    three machines and have run into a bit of a snag. I went through the
    admin and user guides and have successfully installed torque and HOD.
    When I run "hod allocate" it successfully starts hodring on 2 of the
    machines but not the third. The result is that I have a working
    Namenode and Jobtracker (though its UI does not seem to work
    presently) but no slave nodes.

    Even at level 4 debug in all sections there is nothing to indicate
    a failure as the ringmaster has no problem communicating with the
    running hodring jobs and pbsdsh returns without error. I can find no
    logs on any of the machines indicating a torque issue (though I admit
    I am not terribly familiar with torque) and no logs at all for HOD on
    the machine that is not running hodring.

    It would appear that the pbsdsh job simply isn't starting hodring
    on the one node given the lack of any HOD log on that machine. Either
    it is not recognizing the node (seems somewhat unlikely as it comes up
    in pbsnodes as free) or there is a relatively silent failure
    somewhere. If you have any suggestions I would much appreciate them.

    One quick side note is I have successfully run a standard
    hadoop-0.20.0 cluster on these three machines with no difficulty,
    which should rule out connection, ssh or firewall issues.

    Thanks,

    Seb
  • Seb Seith at Aug 27, 2009 at 8:12 pm
    I just realized that it finally posted my original message moments
    after I resent it. I had assumed after that period of time that it
    had not been successfully received by the system since I had not seen
    it come up on the archive or on the list itself. Sorry for the double
    posting there.

    On Thu, Aug 27, 2009 at 3:01 PM, Seb Seithwrote:
    Hello,

    I have been working on setting up a Hadoop on Demand cluster on
    three machines and have run into a bit of a snag.  I went through the
    admin and user guides and have successfully installed torque and HOD.
    When I run "hod allocate" it successfully starts hodring on 2 of the
    machines but not the third.  The result is that I have a working
    Namenode and Jobtracker (though its UI does not seem to work
    presently) but no slave nodes.

    Even at level 4 debug in all sections there is nothing to indicate
    a failure as the ringmaster has no problem communicating with the
    running hodring jobs and pbsdsh returns without error.  I can find no
    logs on any of the machines indicating a torque issue (though I admit
    I am not terribly familiar with torque) and no logs at all for HOD on
    the machine that is not running hodring.

    It would appear that the pbsdsh job simply isn't starting hodring
    on the one node given the lack of any HOD log on that machine.  Either
    it is not recognizing the node (seems somewhat unlikely as it comes up
    in pbsnodes as free) or there is a relatively silent failure
    somewhere.  If you have any suggestions I would much appreciate them.

    One quick side note is I have successfully run a standard
    hadoop-0.20.0 cluster on these three machines with no difficulty,
    which should rule out connection, ssh or firewall issues.

    Thanks,

    Seb

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 27, '09 at 8:01p
activeAug 27, '09 at 8:12p
posts3
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Seb Seith: 3 posts

People

Translate

site design / logo © 2022 Grokbase