Unless or until we put authentication in-place in the webAPI, I think
we need to be very careful about adding admin function to it.

We have clusters with an increasing number of users who are not
cluster admins on them. We don't want to enable them to reconfig the
system. We do want to allow them to use the web UIs to diagnose the
status of their jobs and files.

In our other large clustered applications, we've found that
segregating a web interrogation UI from a file based (or other
control port) based admin UI avoids a lot of pain and allows
administrative flexibility.
On Aug 11, 2006, at 1:14 PM, Doug Cutting (JIRA) wrote:

[ http://issues.apache.org/jira/browse/HADOOP-442?
page=comments#action_12427609 ]

Doug Cutting commented on HADOOP-442:

The slaves file is currently only used by the start/stop scripts,
so it won't help here.

Perhaps the jobtracker and namenode should have a public API that
permits particular hosts to be banned. Then the web ui could then
use this to let adminstrators ban hosts. We could initialize the
list from a config file, in the case of persistently bad hosts.
slaves file should include an 'exclude' section, to prevent "bad"
datanodes and tasktrackers from disrupting a cluster

Key: HADOOP-442
URL: http://issues.apache.org/jira/browse/HADOOP-442
Project: Hadoop
Issue Type: Bug
Reporter: Yoram Arnon

I recently had a few nodes go bad, such that they were
inaccessible to ssh, but were still running their java processes.
tasks that executed on them were failing, causing jobs to fail.
I couldn't stop the java processes, because of the ssh issue, so I
was helpless until I could actually power down these nodes.
restarting the cluster doesn't help, even when removing the bad
nodes from the slaves file - they just reconnect and are accepted.
while we plan to avoid tasks from launching on the same nodes over
and over, what I'd like is to be able to prevent rogue processes
from connecting to the masters.
Ideally, the slaves file will contain an 'exclude' section, which
will list nodes that shouldn't be accessed, and should be ignored
if they try to connect. That would also help in configuring the
slaves file for a large cluster - I'd list the full range of
machines in the cluster, then list the ones that are down in the
'exclude' section
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/
For more information on JIRA, see: http://www.atlassian.com/

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
postedAug 17, '06 at 6:14p
activeAug 17, '06 at 6:14p

1 user in discussion

Eric Baldeschwieler: 1 post



site design / logo © 2021 Grokbase