FAQ
Hi there,

I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
Randomly (periodically), we're getting "Call to namenode" failures on
tasktrackers causing tasks to fail:

2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201105090819_059_m_0038_0Child Error
java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy5.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(Unknown Source)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

The namenode log (logging level = INFO) shows the following a few seconds
either side of the above timestamps. Could be relevant or it could be a
coincidence :

2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)

The jobtracker does however have an entry that correlates with the
tasktracker :

2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:169)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.mapred.Child.main(Child.java:157)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(Unknown Source)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

Can anyone give me any pointers on how to start troubleshooting this issue?
It's very sporadic and we haven't been able to reproduce the issue yet in
our lab. After looking through the mailing list archives, some of the
suggestions revolve around the following settings:

dfs.namenode.handler.count 128 (existing 64)
dfs.datanode.handler.count 10 (existing 3)
dfs.datanode.max.xcievers 4096 (existing 256)

Any pointers ?

Thanks in advance

Sid Simmons
Infrastructure Support Specialist

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 12, '11 at 7:11p
activeMay 12, '11 at 7:11p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Sidney Simmons: 1 post

People

Translate

site design / logo © 2022 Grokbase