Shuffle is configured and I could run MR Job on this 5-nodes cluster before
I move Resourcemanger node from dn4 to dn5.


After analysis logs under $HADOOP_LOG_DIR and review terminal output, I
find the most possible reason that caused map task hang out at 0% is that:
NodeManager won't run correctly because connection is refused caused by
google protocol buffer, so slave in cluster could not communicate with
master(Resourcemanager), job will not run.


By the way, I compile and make/make install protocol buf at the same time
on 5 nodes using parallel ssh tool. May be dn5 environment have something
wrong.

Thanks a lot !



most important part of nodemanager node output is here:
2011-12-21 21:23:21,142 ERROR service.CompositeService
(CompositeService.java:start(72)) - Error starting services
org.apache.hadoop.yarn.server.nodemanager.NodeManager
org.apache.avro.AvroRuntimeException:
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)
at
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)
Caused by: java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)
... 3 more
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException:
Call From dn3/192.168.3.227 to dn4:50030 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
at $Proxy14.registerNodeManager(Unknown Source)
at
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 5 more
Caused by: java.net.ConnectException: Call From dn3/192.168.3.227 to
dn4:50030 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:617)
at org.apache.hadoop.ipc.Client.call(Client.java:1089)
at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
... 7 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:419)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:460)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:557)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
at org.apache.hadoop.ipc.Client.call(Client.java:1065)
... 8 more
2011-12-21 21:23:21,143 INFO event.AsyncDispatcher
(AsyncDispatcher.java:run(71)) - AsyncDispatcher thread interrupted
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:69)
at java.lang.Thread.run(Thread.java:636)
2011-12-21 21:23:21,144 INFO service.AbstractService
(AbstractService.java:stop(75)) - Service:Dispatcher is stopped.

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 6 | next ›
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedDec 20, '11 at 1:15p
activeDec 21, '11 at 1:38p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase