Glad it worked flawlessly this time :)

Arun Ramakrishnan a écrit :
I don't know where the problem was. J-D said somewhere that decommissioning process is well tested and less likely to have bugs.

Anyways, I just resorted to killing 2 nodes. Wait till fsck reports 100% replication to 3. Kill 2 more nodes ... and so on.
Worked fine.


-----Original Message-----
From: Varene Olivier
Sent: Tuesday, July 13, 2010 1:32 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: decommissioning nodes help

Are your datanodes double attached to the network ?
If this is the case, you can indeed see your datanodes as double entries.
You should also check the match between your DNS resolution and the
hostname of your datanodes.

To solve your issue, you can switch off one data node at a time (by
killing) the process.
The master should see that and perform action to maintain the
replication level.
Do it slowly :) (or you might loose some data)
You can have an idea if the process is over or not if the io on block
writing is over


Arun Ramakrishnan a écrit :
That's what I thought.

But,this was what I see in -report for the excluded nodes.

ecommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Dec 31 16:00:00 PST 1969

In the UI, the excluded nodes show up in both live and dead nodes. And its been several hours now. The block counts across the nodes is exactly the same.
The cluster is not accessed by any clients, its not busy at all.

And I have set dfs.balance.bandwidthPerSec = 2000000 in hdfs-site.xml

Anyway, I think I am lost here. Am just resorting to killing 2 nodes at a time sorta backwardish strategy. At least I know it works.


-----Original Message-----
From: Varene Olivier
Sent: Friday, July 09, 2010 7:44 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: decommissioning nodes help


you should see in the Web interface

the status of your node to Decommissioning
when done, it is removed from the list of active nodes

With a huge bandwith to perform the sync, the process is very fast
so, to answer your other mail, process might be done

you can also this the status of your node via CLI

# hadoop dfsadmin -report

Name : ...
Decommission Status : <StatusOfYourNode>

Hope it helps

Arun Ramakrishnan a écrit :
Hi guys

I am a stuck in my attempt to remove nodes from hdfs.

I followed the steps in https://issues.apache.org/jira/browse/HDFS-1125

a) add node to dfs.hosts.exclude

b) dfsadmin -refreshNodes

c) wait for decom to finish

d) remove node from both dfs.hosts and dfs.hosts.exclude

But after step a) and b) how do I know if decommission is complete.

I am in the process of decommissioning 6 nodes and don't want to loose
any blocks ( rep factor is 3 ) with a restart.

I also opened https://issues.apache.org/jira/browse/HDFS-1290 if anyone
is interested.



Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 8 | next ›
Discussion Overview
grouphdfs-user @
postedJul 9, '10 at 12:07a
activeJul 15, '10 at 2:24p



site design / logo © 2022 Grokbase