Gurus,
I'm setting up a security cluster of hadoop .23. But now, the communication between Data Node and Name Node, Node Manager and Resource Manager have problem.
When I start the Node Manager, it will report following error, and then shutdown itself. Did you ever see such issue? Do you have any idea on how to triage this issue?
2012-01-20 12:03:08,258 INFO ipc.HadoopYarnRPC (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.server.api.ResourceTracker
2012-01-20 12:03:08,291 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to ResourceManager at hadoopRM.example.aurora:9003
2012-01-20 12:03:20,399 WARN ipc.Client (Client.java:run(526)) - Couldn't setup connection for nm/hadoopNM.example.au[email protected] to rm/[email protected]
2012-01-20 12:03:20,405 ERROR service.CompositeService (CompositeService.java:start(72)) - Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)
Caused by: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)
... 3 more
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for nm/[email protected] to rm/[email protected]; Host Details : local host is: "hadoopNM/10.112.127.102"; destination host is: ""hadoopRM.example.aurora":9003;
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
at $Proxy14.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 5 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for nm/[email protected] to rm/[email protected]; Host Details : local host is: "hadoopNM/10.112.127.102"; destination host is: ""hadoopRM.example.aurora":9003;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
at org.apache.hadoop.ipc.Client.call(Client.java:1089)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
... 7 more
Caused by: java.io.IOException: Couldn't setup connection for nm/hadoopNM.example.au[email protected] to rm/[email protected]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:527)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
at org.apache.hadoop.ipc.Client.call(Client.java:1065)
... 8 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:407)
at org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:205)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:576)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:572)
... 11 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
... 20 more
Caused by: KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER
at sun.security.krb5.KrbTgsRep.(KrbTgsReq.java:185)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)
... 23 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)
at sun.security.krb5.internal.TGSRep.(KrbTgsRep.java:46)
... 28 more
The error said that no valid server credential, but I've add those credentials in Resource Manager node. The keytab result is as following:
[email protected]:~$ klist -k -e -t /etc/krb5.keytab
Keytab name: WRFILE:/etc/krb5.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
2 01/20/12 10:55:02 rm/[email protected] (aes256-cts-hmac-sha1-96)
2 01/20/12 10:55:02 rm/[email protected] (arcfour-hmac)
2 01/20/12 10:55:02 rm/[email protected] (des3-cbc-sha1)
2 01/20/12 10:55:02 rm/[email protected] (des-cbc-crc)
2 01/19/12 11:19:11 host/[email protected] (aes256-cts-hmac-sha1-96)
2 01/19/12 11:19:11 host/[email protected] (arcfour-hmac)
2 01/19/12 11:19:11 host/[email protected] (des3-cbc-sha1)
2 01/19/12 11:19:11 host/[email protected] (des-cbc-crc)
2 01/19/12 11:20:15 jhs/[email protected] (aes256-cts-hmac-sha1-96)
2 01/19/12 11:20:15 jhs/[email protected] (arcfour-hmac)
2 01/19/12 11:20:15 jhs/[email protected] (des3-cbc-sha1)
2 01/19/12 11:20:15 jhs/[email protected] (des-cbc-crc)
The whole node manager log is attached.
Any idea is appreciated.
Thanks
Emma