FAQ
slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
-----------------------------------------------------------------------------------------------------------------------

Key: HADOOP-442
URL: http://issues.apache.org/jira/browse/HADOOP-442
Project: Hadoop
Issue Type: Bug
Reporter: Yoram Arnon


I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
tasks that executed on them were failing, causing jobs to fail.
I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Bryan Pendleton (JIRA) at Aug 10, 2006 at 8:00 pm
    [ http://issues.apache.org/jira/browse/HADOOP-442?page=comments#action_12427334 ]

    Bryan Pendleton commented on HADOOP-442:
    ----------------------------------------

    I too have had problems like this over time.

    Another way to deal with this might be to hand out a new random ID to each instance on startup. The namenode/jobtracker would know the "correct" value, and treat as rogue any nodes which try to register but don't know the current "correct" value. This could be re-set at each full-cluster restart, to make sure that all nodes participating are "up to date" and correct.


    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: http://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Reporter: Yoram Arnon

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Aug 11, 2006 at 8:17 pm
    [ http://issues.apache.org/jira/browse/HADOOP-442?page=comments#action_12427609 ]

    Doug Cutting commented on HADOOP-442:
    -------------------------------------

    The slaves file is currently only used by the start/stop scripts, so it won't help here.

    Perhaps the jobtracker and namenode should have a public API that permits particular hosts to be banned. Then the web ui could then use this to let adminstrators ban hosts. We could initialize the list from a config file, in the case of persistently bad hosts.
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: http://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Reporter: Yoram Arnon

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Marco Nicosia (JIRA) at Aug 11, 2006 at 9:37 pm
    [ http://issues.apache.org/jira/browse/HADOOP-442?page=comments#action_12427632 ]

    Marco Nicosia commented on HADOOP-442:
    --------------------------------------

    Would it be better to choose either one or the other to be authoritative for all operations?

    1] The namenode/jobtrackers maintain the slaves file. Membership and other administrative functions are made via API calls to the process, which modifies a file on disk. That file is used, but never modified, by slaves.sh, etc. If the file is still text, it can be modified between process restarts.

    2] The namenode/jobtrackers observe and respect the contents of a file on disk. Standard tools can modify it, but the processes would have to poll the file to see if it has been changed.

    I personally prefer #1, tho I'd hope that any API is open (XML-RPC, REST, SOAP...) instead of RMI so that any set of sysadmin automation can talk to it.

    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: http://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Reporter: Yoram Arnon

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Yoram Arnon (JIRA) at Aug 31, 2006 at 8:10 am
    [ http://issues.apache.org/jira/browse/HADOOP-442?page=all ]

    Yoram Arnon updated HADOOP-442:
    -------------------------------

    Component/s: conf
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: http://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • eric baldeschwieler (JIRA) at Nov 16, 2006 at 7:36 pm
    [ http://issues.apache.org/jira/browse/HADOOP-442?page=comments#action_12450506 ]

    eric baldeschwieler commented on HADOOP-442:
    --------------------------------------------

    Current proposal:

    - Add config variables that points to file containing list of nodes HDFS should expect (slaves file) (optional config)
    - Add config variable that points to a file containing a list of excluded nodes (from previous list) (optional config)

    - The nameNode reads these files on startup (iff config). It keeps a list of included nodes and another of excluded nodes. If the include list is configured, it will be tested when a node registers or heartbeats. If the node is not on the list, it will be told to shutdown on the response. If the exclude list is configured, than a node will also be shutdown if listed.

    - We will add an admin command to re-read the inclusion and exclusion files

    - The job tracker will also read these lists and have a new admin command to reread the files


    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: http://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Wendy Chien (JIRA) at Jan 25, 2007 at 3:53 am
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Wendy Chien reassigned HADOOP-442:
    ----------------------------------

    Assignee: Wendy Chien (was: Doug Cutting)
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Jan 26, 2007 at 10:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467944 ]

    Wendy Chien commented on HADOOP-442:
    ------------------------------------

    Here is the latest design (pretty much the same as the previous proposal with a few more details)

    1. Adding two new config variables. By default they will not be configured (commented out of hadoop-default).
    a. hadoop.nodes.include -- file which contains nodes to include. If this variable is configured, then only nodes on this list will be allowed to register with the namenode
    b. hadoop.nodes.exclude -- file which contains nodes to exclude. If this variable is configured, then any node appearing on this list will be denied communication (register, send heartbeats, send block reports, report received blocks, and send error reports) by the namenode, even if it appears in the include file.

    If neither is configured, then any node is allowed to connect. We currently have a slaves file that is used for slaves.sh. This file can be used as the file specified by hadoop.nodes.include, but there is no restriction that it needs to be.

    2. Adding a dfsadmin command (refreshNodes) to reread the inclusion/exclusion files.

    3. The files will be read when the NameNode starts up and whenever it gets a refreshNodes command.

    4. JobTracker will use the same config variables to determine TaskTrackers to include/exclude.



    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Yoram Arnon (JIRA) at Jan 26, 2007 at 11:42 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467961 ]

    Yoram Arnon commented on HADOOP-442:
    ------------------------------------

    Yes, right on!

    ssian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467944 ]


    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Jan 31, 2007 at 1:18 am
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468848 ]

    Wendy Chien commented on HADOOP-442:
    ------------------------------------

    Some changes:
    1. There are actually going to be 4 new config variables. 2 for dfs and 2 for mapred. The current names are:
    dfs.client.include
    dfs.client.exclude
    mapred.client.include
    mapred.client.exclude

    -These variables are going to be commented out in hadoop-default, so by default they will not be configured.

    2. I created a generic gatekeeper class in hadoop.util which is used by dfs and mapred.

    3. Currently, the include list will only be looked at for registration. It will not be used for heartbeats, block reports, etc.. The exclude list will be used for everything. The reason for this is if a node can't register, then it can't send heartbeats, so we don't need to check it again.

    4. The JobTracker will not have an admin command to refresh the nodes because such a mechanism does not yet exist.

    Comments welcome.

    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Jan 31, 2007 at 6:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469139 ]

    dhruba borthakur commented on HADOOP-442:
    -----------------------------------------

    Nice and simple design. I have one comment. I would like to avoid every heartbeat checking the exclude list. We already know that heartbeat-processing should be very lightweight for the cluster to scale. Can we make the heartbeat check the exclude list only if it is the first heartbeat received after the last invocation of "dfsadmin refreshNodes" command?


    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sameer Paranjpye (JIRA) at Feb 1, 2007 at 12:05 am
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469272 ]

    Sameer Paranjpye commented on HADOOP-442:
    -----------------------------------------

    A couple of points/questions:

    - The [dfs|mapred].client prefixes are a little confusing. They could be construed as lists of machines from which HDFS clients are permitted to connect, which is not the intent here. I'd also get rid of the 'include' suffix since that list is used to constrain the set of nodes that can participate. I propose calling them:
    [dfs|mapred].hosts and [dfs|mapred].hosts.exclude

    - What is the format of the include and exclude files, it needs to be specified here.

    - The variables should be set to reasonable default values in hadoop-default.xml, not excluded. Setting them to empty strings by default is probably fine.

    - What happens when a node is taken off the include list and not placed on the exclude list? Do heartbeats and block reports from it continue to get processed? Perhaps the 'refreshNodes' command should examine the collection of DatanodeDescriptors and update it to only include those that are on the new include list.

    - What are the semantics of exclude? How does it differ from decommission? Can we unify the two and have exclude mean decommission? What happens if all replicas of some blocks are excluded?


    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Yoram Arnon (JIRA) at Feb 1, 2007 at 3:04 am
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469320 ]

    Yoram Arnon commented on HADOOP-442:
    ------------------------------------

    so perhaps a simpler approach, more in line with the original request:

    * single slaves file
    * no new config variables, since the slaves file is configurable anyway via the hadoop script(s) and env vars
    * lines in the slaves file that start with the word 'exclude' imply that the nodes listed on that line are to be excluded

    pros of this approach:
    * completely backwards compatible
    * very simple for the administrator (and implementor) - no config, no confusion, single slaves file to edit

    the intent was to simplify the administrator's life. I'd list *all* the cluster in the slaves file, then list a handful of nodes under an 'exclude' section. The exclude statement is an override on whatever is listed elsewhere in the file.
    The behavior is:
    * datanode/tasktracker on nodes that appear in the slaves file and are not excluded are launched on startup
    * any node may connect to the master, unless it is excluded (for backwards compatibility. I'd actually prefer to only allow included nodes to connect but that will wait, perhaps indefinitely)

    nodes are excluded for a reason, basically because they're dead and
    * their presence clutters up the startup/shutdown script output and error logs
    * if they are included in the cluster they may degrade its behavior or performance so they should be actively ignored
    - I wouldn't worry about the replicas of data they contain - they're long dead and re-replicated. If they're alive, I won't exclude them.

    I love the admin command to reread the file.

    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Feb 1, 2007 at 10:17 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469597 ]

    Wendy Chien commented on HADOOP-442:
    ------------------------------------

    -I'll change the names to the ones Sameer proposed and make the default an empty string.

    -The exclude/hosts files expect full hostnames listed, separated by whitespace (similar to the slaves file). I can refine this if it's too cumbersome.

    -Some of us had a discussion about exclude and decommission. The conclusion is that they should be combined.

    Below is the new design:

    1. Namenode behavior in dealing with exclude/hosts lists.
    a. When the namenode starts up, it reads the hosts and exclude files. When a node on the exclude list registers with the namenode, we mark it to be decommissioned. When nodes are done being decommissioned, they are shutdown. Nodes not on the include list will not be allowed to register.
    b. When the namenode gets a refreshNodes command, it will update the hosts and exclude lists. If a node is added to the hosts lists, then it will be allowed to register. If a node is removed from the hosts list, then any further communication will be disallowed and it will be asked to shutdown. If a node is added to the exclude list, then it will start to be decommissioned. If a node is removed from the exlcude list, then the decommission process will be stopped.

    2. Decommissioning a node behaves slightly differently from before.
    a. When a node is being decommissioned, we do not want to use its copies to replicate unless no copies exist on non-decommissioned nodes.
    b. A new thread will be used to periodically check if a node is done being decommissioned. If it is, the next time the node heartbeats, it will be told to shut down.

    Comments welcome (and please let me know if I forgot anything).
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Feb 6, 2007 at 9:44 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470746 ]

    Wendy Chien commented on HADOOP-442:
    ------------------------------------

    To combine exclude and decommission, I've taken out the dsfadmin command decommission. Starting and stopping decommission is now done through configured files. The last option in the original decommission command was one to see the state of the decommission for a set of nodes. Should this option be combined with refreshNodes or should it still have it's own command (especially if the decommission state is expanded to show progress or if we want to also show other states for the nodes)?
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien

    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Feb 7, 2007 at 3:57 am
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Wendy Chien updated HADOOP-442:
    -------------------------------

    Attachment: hadoop-442-8.patch

    I still need to make changes to TestDecommission, but I wanted to put up a patch for reviews.
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Attachments: hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 12, 2007 at 7:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472390 ]

    dhruba borthakur commented on HADOOP-442:
    -----------------------------------------

    1. It would be nice if the description of dfs.hosts and dfs.hosts.exclude
    says "Full path name of file ..."

    2. The FSNamesystem.close() function should have a dnthread.join() call.

    3. Can we make FSNamesystem.refreshNodes() package private? i.e. remove the
    "public" keyword from its definition.

    4. The method FSNamesystem.refreshNodes migth need to be synchronized because
    it traverses the datanodeMap. However, the first line in this method (that
    invokes "hostReader.refresh" should preferably be outside this synchrnization.
    It is good to read in contents from the hosts file outside the global
    FSNamesystem lock.

    5. The methods inExcludedHostsList() and inHostsList() could be unified if we
    pass in the specified list as a parameter to this unified method.

    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Attachments: hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Feb 15, 2007 at 12:22 am
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Wendy Chien updated HADOOP-442:
    -------------------------------

    Attachment: hadoop-442-10.patch

    I've attached a new patch. This one also has the changes for TestDecommission and incorporates Dhruba's comments.
    1. It would be nice if the description of dfs.hosts and dfs.hosts.exclude says "Full path name of file ..." Done.
    2. The FSNamesystem.close() function should have a dnthread.join() call. Done.
    3. Can we make FSNamesystem.refreshNodes() package private? i.e. remove the "public" keyword from its definition. Done.
    4. The method FSNamesystem.refreshNodes migth need to be synchronized Done.
    5. The methods inExcludedHostsList() and inHostsList() could be unified
    They are actually a little different because inHostsList returns true if the list is empty too.

    6. Dhruba and I also discussed removing the STOPPED state that I added from DatanodeInfo since it can be merged with DECOMMISSIONED.


    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Attachments: hadoop-442-10.patch, hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 15, 2007 at 8:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473491 ]

    dhruba borthakur commented on HADOOP-442:
    -----------------------------------------

    0. Nice work on removing the STOPPED state. Makes the system less complex.

    1. In TestDecommission.waitNodeState, we wait for half second per iteration.
    If this setting causes many prints of the form "Waiting for node to change.."
    then maybe we can change the wait period to 1 second instead of half second.

    2. In TestDecommission, we have the following logic:
    a. decommissionNode()
    b. waitNodeState(DECOMMISSION_INPROGRESS)
    c. commissionNode()
    d. waitNodeState(NORMAL)
    e. decommissionNode()
    f. waitNodeState(DECOMMISSIONED)
    g. checkFile()

    I guess b. should be waitNodeState(DECOMMISSIONED)
    Also, we can sneak in a call to checkFile() betwen steps b and c.

    3. Maybe ClientProtocol.refreshNodes can return a void instead of
    "boolean".

    4. The original way to shutdown a datanode was to make the namenode send a
    DNA_SHUTDOWN command in response to a heartbeat. You enahnced this logic
    to make the datanode catch a UnregisteredDatanodeException and shut itself
    down. Thus, we will now have two ways to shutdown a datanode. Do you think
    that it is preferable to have only one way to shutdown a datanode?

    5. An earlier patch introduced an async thread called the ReplicationMonitor.
    The ReplicationMonitor thread invokes checkDecommissionState(). This
    probably means that the DecommissionedMonitor thread is not needed anymore.

    6. The FSNamesystem.verifyNodeRegistration needs to be synchronized since it
    traverses hosts lists and datanode lists. Since
    FSNamesystem.startDecommission and FSNamesystem.stopDecommission are private
    and are always called from already-synchronized methods, we can remove
    the "synchronized" keyword from their definitions.

    7. I am kind-of reluctant to put in added complexity to pendingTransfers
    to handle the case that a decommissioned node should not be the
    source of a replication request. Especially because the above rule is
    not enforced if there is only one replica in the cluster. Is there any
    way that we can avoid putting in this special purpose case? What bad
    thing can occur if we select the decommissioned node as the source of
    a replication request?

    8. Ideally, the FSNamesystem.verifyNodeShutdown needs to be synchronized
    because it looks up the datanode descriptor. But this method is called
    for every RPC request and a "synchronized" call just adds overhead.
    Maybe we can make FSNamesystem.gotHeartBeat, FSNamesystem.processReport
    and FSNamesystem.blockReceived call FSNamesystem.verifyNodeShutdown.

    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Attachments: hadoop-442-10.patch, hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at Feb 15, 2007 at 10:48 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473544 ]

    dhruba borthakur commented on HADOOP-442:
    -----------------------------------------

    Regarding comment 5 above, it actually might make sense to have a separate thread to check whether a decommission is completed or not. It can run on its own schedule. The ReplicationMonitor thread periodically works every 3 seconds and this periodicity is "too" frequent to be checking decommissioned nodes.
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Attachments: hadoop-442-10.patch, hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Feb 16, 2007 at 9:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Wendy Chien updated HADOOP-442:
    -------------------------------

    Attachment: hadoop-442-11.patch

    Thanks for looking over the patch, Dhruba! I updated it to incorporate Dhruba's comments.

    1. TestDecommission.waitNodeState waits for 1 second now.
    2. I do mean to check for DECOMMISSION_INRPOGRESS to make sure the decommission began, but I want to stop it before it finishes so I can test commissioning a node works too.
    3. refreshNodes now returns void.
    4. UnregisteredDatanodeException was already there, but I also added DisallowedDatanodeException to that clause. I'm inclined to leave them together since they are similar.
    6. added synchronized to verifyNodeRegistration, and removed it from start/stopDecommission.
    7. removed the new code from pendingTransfers
    8. I moved verifyNodeShutdown to FSNamesystem.

    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Attachments: hadoop-442-10.patch, hadoop-442-11.patch, hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Wendy Chien (JIRA) at Feb 16, 2007 at 11:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Wendy Chien updated HADOOP-442:
    -------------------------------

    Status: Patch Available (was: Open)
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Attachments: hadoop-442-10.patch, hadoop-442-11.patch, hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Feb 21, 2007 at 8:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doug Cutting updated HADOOP-442:
    --------------------------------

    Resolution: Fixed
    Fix Version/s: 0.12.0
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks, Wendy!
    slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster
    -----------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-442
    URL: https://issues.apache.org/jira/browse/HADOOP-442
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Reporter: Yoram Arnon
    Assigned To: Wendy Chien
    Fix For: 0.12.0

    Attachments: hadoop-442-10.patch, hadoop-442-11.patch, hadoop-442-8.patch


    I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
    tasks that executed on them were failing, causing jobs to fail.
    I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
    restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
    while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
    Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedAug 10, '06 at 7:46a
activeFeb 21, '07 at 8:14p
posts23
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Doug Cutting (JIRA): 23 posts

People

Translate

site design / logo © 2021 Grokbase