FAQ
Hi all,

I am running the hadoop-0.19.1 and met strange problem
in these days. Several days before, hadoop run smoothly
and three nodes have been running TaskTracker and DataNode
deamons. However, one of node can not start DataNode
after I moved them to another place.

I have checked the network and firewall. The network is ok
because ssh can ship me from master to all the slaves.
And, firewall is not activated in all the machines.

In the shutdown node, runing "jps" can only find
TaskTracker but not found DateNode. I checked the log and
out files in logs/, and found the following message:

---------------- logs/hadoop-datanode-hdt1.mycluster.com.log---------
...
2009-05-30 14:58:54,830 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting
down: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data
node 10.61.0.143:50010 is attempting to report storage ID
DS-983240698-127.0.0.1-50010-1236515374222. Node 10.61.0.5:50010 is
expected to serve this storage.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:3800)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:2801)
at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:636)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
...
2009-05-30 14:58:54,993 INFO org.apache.hadoop.ipc.Server: Stopping
IPC Server Responder
2009-05-30 14:58:54,994 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
threadgroup to exit, active threads is 1
2009-05-30 14:58:54,994 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.61.0.143:50010,
storageID=DS-983240698-127.0.0.1-50010-1236515374222, infoPort=50075,
ipcPort=50020):DataXceiveServer:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
at java.lang.Thread.run(Thread.java:619)
...
infoPort=50075, ipcPort=50020):Finishing DataNode in:
FSDataset{dirpath='/home/hadoop/myhadoop2/hadoop-hdfs/data/current'}
2009-05-30 14:58:56,096 INFO org.apache.hadoop.ipc.Server: Stopping
server on 50020
2009-05-30 14:58:56,096 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
threadgroup to exit, active threads is 0
2009-05-30 14:58:56,097 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hdt1.mycluster.com/10.61.0.143
************************************************************/
------------------------------------------------------------------------------------------------

I am wondering what is wrong in my configuration? I just shutdown the
node machines and
move to another place and not anything have been changed in
configuration and OS and software.
And, is hadoop affected by network topology (for example, do all the
nodes need to be in same
area controlled by same hub) ?

Any help?


Thanks again,

Ian

Search Discussions

  • HRoger at May 31, 2009 at 4:35 pm
    You should do thart in the right way as the follow steps:
    1.create a new file named as excludes under $HADOOP_HOME with the datanode
    hostname(IP) in it by one name every line.
    2.edit the hadoop-site.xml by adding
    <property>
    <name>dfs.hosts.exclude</name>
    <value>excludes</ value>
    </property>
    and save it.
    3.execute the command "bin/hadoop dfsadmin -refreshNodes" in the namenode
    host.
    4.when the step 3 finished,you can run "bin/hadoop dfsadmin -report and
    check the result.

    jonhson.ian wrote:
    Hi all,

    I am running the hadoop-0.19.1 and met strange problem
    in these days. Several days before, hadoop run smoothly
    and three nodes have been running TaskTracker and DataNode
    deamons. However, one of node can not start DataNode
    after I moved them to another place.

    I have checked the network and firewall. The network is ok
    because ssh can ship me from master to all the slaves.
    And, firewall is not activated in all the machines.

    In the shutdown node, runing "jps" can only find
    TaskTracker but not found DateNode. I checked the log and
    out files in logs/, and found the following message:

    ---------------- logs/hadoop-datanode-hdt1.mycluster.com.log---------
    ...
    2009-05-30 14:58:54,830 WARN
    org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting
    down: org.apache.hadoop.ipc.RemoteException:
    org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data
    node 10.61.0.143:50010 is attempting to report storage ID
    DS-983240698-127.0.0.1-50010-1236515374222. Node 10.61.0.5:50010 is
    expected to serve this storage.
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:3800)
    at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:2801)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:636)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
    ...
    2009-05-30 14:58:54,993 INFO org.apache.hadoop.ipc.Server: Stopping
    IPC Server Responder
    2009-05-30 14:58:54,994 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
    threadgroup to exit, active threads is 1
    2009-05-30 14:58:54,994 WARN
    org.apache.hadoop.hdfs.server.datanode.DataNode:
    DatanodeRegistration(10.61.0.143:50010,
    storageID=DS-983240698-127.0.0.1-50010-1236515374222, infoPort=50075,
    ipcPort=50020):DataXceiveServer:
    java.nio.channels.AsynchronousCloseException
    at
    java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
    at
    sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
    at
    sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
    at
    org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
    at java.lang.Thread.run(Thread.java:619)
    ...
    infoPort=50075, ipcPort=50020):Finishing DataNode in:
    FSDataset{dirpath='/home/hadoop/myhadoop2/hadoop-hdfs/data/current'}
    2009-05-30 14:58:56,096 INFO org.apache.hadoop.ipc.Server: Stopping
    server on 50020
    2009-05-30 14:58:56,096 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
    threadgroup to exit, active threads is 0
    2009-05-30 14:58:56,097 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down DataNode at hdt1.mycluster.com/10.61.0.143
    ************************************************************/
    ------------------------------------------------------------------------------------------------

    I am wondering what is wrong in my configuration? I just shutdown the
    node machines and
    move to another place and not anything have been changed in
    configuration and OS and software.
    And, is hadoop affected by network topology (for example, do all the
    nodes need to be in same
    area controlled by same hub) ?

    Any help?


    Thanks again,

    Ian
    --
    View this message in context: http://www.nabble.com/DataNode-not-started-up-and-%22org.apache.hadoop.ipc.RemoteException%22--is-thrown-out-tp23791017p23804701.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.
  • Ian jonhson at Jun 1, 2009 at 9:29 am

    On Mon, Jun 1, 2009 at 12:35 AM, HRoger wrote:
    You should do thart in the right way as the follow steps:
    1.create a new file named as excludes under $HADOOP_HOME with the datanode
    hostname(IP) in it by one name every line.
    do these in master? What I mean is that I executes
    the following commands just in Namenode (JobTracker):

    $ echo <ip_of_fail_node> > excludes

    2.edit the hadoop-site.xml by adding
    <property>
    <name>dfs.hosts.exclude</name>
    <value>excludes</ value>
    </property>
    and save it.
    which hadoop-site.xml I would edit, in Namenode or in Datanode or all nodes?


    3.execute the command "bin/hadoop dfsadmin -refreshNodes" in the namenode
    host.
    4.when the step 3 finished,you can run "bin/hadoop dfsadmin -report and
    check the result.
  • Ian jonhson at Jun 1, 2009 at 9:36 am

    On Mon, Jun 1, 2009 at 12:35 AM, HRoger wrote:
    You should do thart in the right way as the follow steps:
    1.create a new file named as excludes under $HADOOP_HOME with the datanode
    hostname(IP) in it by one name every line.
    2.edit the hadoop-site.xml by adding
    <property>
    <name>dfs.hosts.exclude</name>
    <value>excludes</ value>
    </property>
    and save it.
    3.execute the command "bin/hadoop dfsadmin -refreshNodes" in the namenode
    host.
    4.when the step 3 finished,you can run "bin/hadoop dfsadmin -report and
    check the result.

    I executed above steps all in Namenode and I got following message
    (without restart hadoop):

    ----------------- dump of screeen -----------------------

    $ bin/hadoop dfsadmin -refreshNodes
    [hadoop@hdt0 hadoop-0.19.1]$ bin/hadoop dfsadmin -report
    Safe mode is ON
    Configured Capacity: 152863682560 (142.37 GB)
    Present Capacity: 84421242880 (78.62 GB)
    DFS Remaining: 84370862080 (78.58 GB)
    DFS Used: 50380800 (48.05 MB)
    DFS Used%: 0.06%

    -------------------------------------------------
    Datanodes available: 1 (3 total, 2 dead)

    Name: 10.61.0.5:50010
    Decommission Status : Decommission in progress
    Configured Capacity: 152863682560 (142.37 GB)
    DFS Used: 50380800 (48.05 MB)
    Non DFS Used: 68442439680 (63.74 GB)
    DFS Remaining: 84370862080(78.58 GB)
    DFS Used%: 0.03%
    DFS Remaining%: 55.19%
    Last contact: Mon Jun 01 17:32:59 CST 2009


    Name: 10.61.0.7
    Decommission Status : Normal
    Configured Capacity: 0 (0 KB)
    DFS Used: 0 (0 KB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 0(0 KB)
    DFS Used%: 100%
    DFS Remaining%: 0%
    Last contact: Thu Jan 01 08:00:00 CST 1970


    Name: 10.61.0.143
    Decommission Status : Normal
    Configured Capacity: 0 (0 KB)
    DFS Used: 0 (0 KB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 0(0 KB)
    DFS Used%: 100%
    DFS Remaining%: 0%
    Last contact: Thu Jan 01 08:00:00 CST 1970

    -----------------------------------------------------------------

    two nodes has been dead... hmm... what happen?
    and any help?


    Thanks again,

    Ian
  • HRoger at Jun 1, 2009 at 10:20 am
    Hi!,all the steps should be done in the namenode.
    you can execute the "-report" twice one before the "-refreshNodes" and one
    later then compare the result!

    jonhson.ian wrote:
    On Mon, Jun 1, 2009 at 12:35 AM, HRoger wrote:

    You should do thart in the right way as the follow steps:
    1.create a new file named as excludes under $HADOOP_HOME with the
    datanode
    hostname(IP) in it by one name every line.
    2.edit the hadoop-site.xml by adding
    <property>
    <name>dfs.hosts.exclude</name>
    <value>excludes</ value>
    </property>
    and save it.
    3.execute the command "bin/hadoop dfsadmin -refreshNodes" in the namenode
    host.
    4.when the step 3 finished,you can run "bin/hadoop dfsadmin -report and
    check the result.

    I executed above steps all in Namenode and I got following message
    (without restart hadoop):

    ----------------- dump of screeen -----------------------

    $ bin/hadoop dfsadmin -refreshNodes
    [hadoop@hdt0 hadoop-0.19.1]$ bin/hadoop dfsadmin -report
    Safe mode is ON
    Configured Capacity: 152863682560 (142.37 GB)
    Present Capacity: 84421242880 (78.62 GB)
    DFS Remaining: 84370862080 (78.58 GB)
    DFS Used: 50380800 (48.05 MB)
    DFS Used%: 0.06%

    -------------------------------------------------
    Datanodes available: 1 (3 total, 2 dead)

    Name: 10.61.0.5:50010
    Decommission Status : Decommission in progress
    Configured Capacity: 152863682560 (142.37 GB)
    DFS Used: 50380800 (48.05 MB)
    Non DFS Used: 68442439680 (63.74 GB)
    DFS Remaining: 84370862080(78.58 GB)
    DFS Used%: 0.03%
    DFS Remaining%: 55.19%
    Last contact: Mon Jun 01 17:32:59 CST 2009


    Name: 10.61.0.7
    Decommission Status : Normal
    Configured Capacity: 0 (0 KB)
    DFS Used: 0 (0 KB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 0(0 KB)
    DFS Used%: 100%
    DFS Remaining%: 0%
    Last contact: Thu Jan 01 08:00:00 CST 1970


    Name: 10.61.0.143
    Decommission Status : Normal
    Configured Capacity: 0 (0 KB)
    DFS Used: 0 (0 KB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 0(0 KB)
    DFS Used%: 100%
    DFS Remaining%: 0%
    Last contact: Thu Jan 01 08:00:00 CST 1970

    -----------------------------------------------------------------

    two nodes has been dead... hmm... what happen?
    and any help?


    Thanks again,

    Ian
    --
    View this message in context: http://www.nabble.com/DataNode-not-started-up-and-%22org.apache.hadoop.ipc.RemoteException%22--is-thrown-out-tp23791017p23812616.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.
  • Ian jonhson at Jun 1, 2009 at 2:40 pm
    Thanks~

    but I still do not know how to deal with the issue.

    I can find the dead of datanode daemon, however I can not restart it because
    the log file also shows there are something wrong in RPC (right?)

    On Mon, Jun 1, 2009 at 6:20 PM, HRoger wrote:

    Hi!,all the steps should be done in the namenode.
    you can execute the "-report" twice one before the "-refreshNodes" and one
    later then compare the result!

    jonhson.ian wrote:
    On Mon, Jun 1, 2009 at 12:35 AM, HRoger wrote:

    You should do thart in the right way as the follow steps:
    1.create a new file named as excludes under $HADOOP_HOME with the
    datanode
    hostname(IP) in it by one name every line.
    2.edit the hadoop-site.xml by adding
    <property>
    <name>dfs.hosts.exclude</name>
    <value>excludes</ value>
    </property>
    and save it.
    3.execute the command "bin/hadoop dfsadmin -refreshNodes" in the namenode
    host.
    4.when the step 3 finished,you can run "bin/hadoop dfsadmin -report and
    check the result.

    I executed above steps all in Namenode and I got following message
    (without restart hadoop):

    ----------------- dump of screeen -----------------------

    $ bin/hadoop dfsadmin -refreshNodes
    [hadoop@hdt0 hadoop-0.19.1]$ bin/hadoop dfsadmin -report
    Safe mode is ON
    Configured Capacity: 152863682560 (142.37 GB)
    Present Capacity: 84421242880 (78.62 GB)
    DFS Remaining: 84370862080 (78.58 GB)
    DFS Used: 50380800 (48.05 MB)
    DFS Used%: 0.06%

    -------------------------------------------------
    Datanodes available: 1 (3 total, 2 dead)

    Name: 10.61.0.5:50010
    Decommission Status : Decommission in progress
    Configured Capacity: 152863682560 (142.37 GB)
    DFS Used: 50380800 (48.05 MB)
    Non DFS Used: 68442439680 (63.74 GB)
    DFS Remaining: 84370862080(78.58 GB)
    DFS Used%: 0.03%
    DFS Remaining%: 55.19%
    Last contact: Mon Jun 01 17:32:59 CST 2009


    Name: 10.61.0.7
    Decommission Status : Normal
    Configured Capacity: 0 (0 KB)
    DFS Used: 0 (0 KB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 0(0 KB)
    DFS Used%: 100%
    DFS Remaining%: 0%
    Last contact: Thu Jan 01 08:00:00 CST 1970


    Name: 10.61.0.143
    Decommission Status : Normal
    Configured Capacity: 0 (0 KB)
    DFS Used: 0 (0 KB)
    Non DFS Used: 0 (0 KB)
    DFS Remaining: 0(0 KB)
    DFS Used%: 100%
    DFS Remaining%: 0%
    Last contact: Thu Jan 01 08:00:00 CST 1970

    -----------------------------------------------------------------

    two nodes has been dead... hmm...  what happen?
    and any help?


    Thanks again,

    Ian
    --
    View this message in context: http://www.nabble.com/DataNode-not-started-up-and-%22org.apache.hadoop.ipc.RemoteException%22--is-thrown-out-tp23791017p23812616.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 30, '09 at 7:23a
activeJun 1, '09 at 2:40p
posts6
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Ian jonhson: 4 posts HRoger: 2 posts

People

Translate

site design / logo © 2022 Grokbase