FAQ
I would like to know the correct way of removing dead nodes from a Cloudera
Manager 4.1.3 managed cluster. We had an instance go down permanently so
decommissioning it is out of the question. Before I used to use the
includes and excludes files along with the "refreshNodes" command, is there
a similar thing process using Cloudera Manager?

Thanks,
Ben

Search Discussions

  • Philip Zeyliger at Feb 12, 2013 at 6:04 pm
    Hi Benjamin,

    You'll need to delete the host. First, go through all the roles on that
    host (you can find that via that host's page), and stop them. Then you'll
    need to remove those roles from their individual services (by visiting the
    service instances page, and deleting the roles). At that point, you'll be
    able to delete the host from the hosts page. I recognize this is a bit
    tedious.

    If the node wasn't particularly special, HDFS will have already picked up
    on the fact that it's dead and made extra copies of data, etc.

    Cheers,

    -- Philip


    On Mon, Feb 11, 2013 at 9:42 AM, Benjamin Kim wrote:

    I would like to know the correct way of removing dead nodes from a
    Cloudera Manager 4.1.3 managed cluster. We had an instance go down
    permanently so decommissioning it is out of the question. Before I used to
    use the includes and excludes files along with the "refreshNodes" command,
    is there a similar thing process using Cloudera Manager?

    Thanks,
    Ben
  • Philip Zeyliger at Feb 13, 2013 at 4:07 pm
    (Including scm-users; hope that's ok)

    Hi Jim,

    It's a bit hard to tell which scenario you're in, but I'll give it a go.
    HDFS, when it notices an under-replicated block, takes its time to
    re-replicate it. That's because it's pretty common for a machine to be
    down for just a few minutes (e.g., when it's restarted), and it would be
    wasteful to duplicate all those blocks just to find out that the machine
    was in the process of coming back. So, it may be just the case that this
    will go away on its own as HDFS finishes the replications.

    Another common scenario is that you've run something like a "teragen" job
    which generates files with replication factor=1. (Replication can be set
    per file.) In this case, there's no other place to copy them from.

    You can use "sudo -u hdfs hdfs fsck / -blocks" to figure out which blocks
    are under-replicated.

    Cheers,

    -- Philip

    On Wed, Feb 13, 2013 at 7:51 AM, Jim Hendricks wrote:

    Philip,
    I tried this. I then completely rebuilt the datanode to simulate a loss.
    When I brought the new node back online and balanced the cluster, I noticed
    4 under replicated blocks. I cleared the /tmp directory on HDFS but I do
    not seem to be able to clear this.

    Not sure, what I need to do in the process to fix this and what in the
    manager I need to run to clear it?

    Jim

    On Tuesday, February 12, 2013 12:04:33 PM UTC-6, Philip Zeyliger wrote:

    Hi Benjamin,

    You'll need to delete the host. First, go through all the roles on that
    host (you can find that via that host's page), and stop them. Then you'll
    need to remove those roles from their individual services (by visiting the
    service instances page, and deleting the roles). At that point, you'll be
    able to delete the host from the hosts page. I recognize this is a bit
    tedious.

    If the node wasn't particularly special, HDFS will have already picked up
    on the fact that it's dead and made extra copies of data, etc.

    Cheers,

    -- Philip


    On Mon, Feb 11, 2013 at 9:42 AM, Benjamin Kim wrote:

    I would like to know the correct way of removing dead nodes from a
    Cloudera Manager 4.1.3 managed cluster. We had an instance go down
    permanently so decommissioning it is out of the question. Before I used to
    use the includes and excludes files along with the "refreshNodes" command,
    is there a similar thing process using Cloudera Manager?

    Thanks,
    Ben
  • Jim Hendricks at Feb 13, 2013 at 5:28 pm
    That did it. I found the blocks in a user directory under a .staging dir.

    All I can assume is it existed before I started working and I didn't notice
    it.

    You deadnode replacement process works great,

    Thanks,
    Jim
    On Monday, February 11, 2013 11:42:43 AM UTC-6, Benjamin Kim wrote:

    I would like to know the correct way of removing dead nodes from a
    Cloudera Manager 4.1.3 managed cluster. We had an instance go down
    permanently so decommissioning it is out of the question. Before I used to
    use the includes and excludes files along with the "refreshNodes" command,
    is there a similar thing process using Cloudera Manager?

    Thanks,
    Ben

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedFeb 11, '13 at 5:42p
activeFeb 13, '13 at 5:28p
posts4
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase