FAQ
Gurus,
I'm setting up a security cluster of hadoop .23. But now, the communication between Data Node and Name Node, Node Manager and Resource Manager have problem.
When I start the Node Manager, it will report following error, and then shutdown itself. Did you ever see such issue? Do you have any idea on how to triage this issue?

2012-01-20 12:03:08,258 INFO ipc.HadoopYarnRPC (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.server.api.ResourceTracker
2012-01-20 12:03:08,291 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to ResourceManager at hadoopRM.example.aurora:9003
2012-01-20 12:03:20,399 WARN ipc.Client (Client.java:run(526)) - Couldn't setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to rm/hadoopRM.example.aurora@EXAMPLE.AURORA
2012-01-20 12:03:20,405 ERROR service.CompositeService (CompositeService.java:start(72)) - Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)
Caused by: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)
... 3 more
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is: "hadoopNM/10.112.127.102"; destination host is: ""hadoopRM.example.aurora":9003;
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
at $Proxy14.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 5 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is: "hadoopNM/10.112.127.102"; destination host is: ""hadoopRM.example.aurora":9003;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
at org.apache.hadoop.ipc.Client.call(Client.java:1089)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
... 7 more
Caused by: java.io.IOException: Couldn't setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to rm/hadoopRM.example.aurora@EXAMPLE.AURORA
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:527)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
at org.apache.hadoop.ipc.Client.call(Client.java:1065)
... 8 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:407)
at org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:205)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:576)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:572)
... 11 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
... 20 more
Caused by: KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER
at sun.security.krb5.KrbTgsRep.(KrbTgsReq.java:185)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)
... 23 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)
at sun.security.krb5.internal.TGSRep.(KrbTgsRep.java:46)
... 28 more

The error said that no valid server credential, but I've add those credentials in Resource Manager node. The keytab result is as following:
line@hadoopRM:~$ klist -k -e -t /etc/krb5.keytab
Keytab name: WRFILE:/etc/krb5.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA (aes256-cts-hmac-sha1-96)
2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA (arcfour-hmac)
2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA (des3-cbc-sha1)
2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA (des-cbc-crc)
2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA (aes256-cts-hmac-sha1-96)
2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA (arcfour-hmac)
2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA (des3-cbc-sha1)
2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA (des-cbc-crc)
2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA (aes256-cts-hmac-sha1-96)
2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA (arcfour-hmac)
2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA (des3-cbc-sha1)
2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA (des-cbc-crc)

The whole node manager log is attached.

Any idea is appreciated.
Thanks
Emma

Search Discussions

  • Vinod Kumar Vavilapalli at Jan 20, 2012 at 5:24 am
    Hi,

    Just today evening, I happened to run into someone who had the same
    issue. After some debugging, I cornered that to the hostnames having
    upper-case characters. Somehow, when DataNode or NodeManager try to
    get a service ticket for their corresponding services (NameNode and
    ResourceManager respectively), the hostname were getting converted
    into all lowercase. You can see if it is the same situation with you
    by looking at krb5kdc logs.

    If that is the case, changing the hostnames everywhere to be all
    small-case may help. Please try that and let me know.

    HTH,
    +Vinod

    On Thu, Jan 19, 2012 at 8:52 PM, Emma Lin wrote:
    Gurus,

    I’m setting up a security cluster of hadoop .23. But now, the communication
    between Data Node and Name Node, Node Manager and Resource Manager have
    problem.

    When I start the Node Manager, it will report following error, and then
    shutdown itself. Did you ever see such issue? Do you have any idea on how to
    triage this issue?



    2012-01-20 12:03:08,258 INFO  ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.server.api.ResourceTracker

    2012-01-20 12:03:08,291 INFO  nodemanager.NodeStatusUpdaterImpl
    (NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to
    ResourceManager at hadoopRM.example.aurora:9003

    2012-01-20 12:03:20,399 WARN  ipc.Client (Client.java:run(526)) - Couldn't
    setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    2012-01-20 12:03:20,405 ERROR service.CompositeService
    (CompositeService.java:start(72)) - Error starting services
    org.apache.hadoop.yarn.server.nodemanager.NodeManager

    org.apache.avro.AvroRuntimeException:
    java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)

    at
    org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)

    Caused by: java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)

    ... 3 more

    Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed
    on local exception: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)

    at $Proxy14.registerNodeManager(Unknown Source)

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)

    ... 5 more

    Caused by: java.io.IOException: Failed on local exception:
    java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)

    at org.apache.hadoop.ipc.Client.call(Client.java:1089)

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)

    ... 7 more

    Caused by: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:527)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)

    at
    org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)

    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)

    at org.apache.hadoop.ipc.Client.call(Client.java:1065)

    ... 8 more

    Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by
    GSSException: No valid credentials provided (Mechanism level: Server not
    found in Kerberos database (7) - UNKNOWN_SERVER)]

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)

    at
    org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)

    at
    org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:407)

    at
    org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:205)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:576)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:573)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:572)

    ... 11 more

    Caused by: GSSException: No valid credentials provided (Mechanism level:
    Server not found in Kerberos database (7) - UNKNOWN_SERVER)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)

    ... 20 more

    Caused by: KrbException: Server not found in Kerberos database (7) -
    UNKNOWN_SERVER

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)

    at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)

    at
    sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)

    at
    sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)

    at
    sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)

    ... 23 more

    Caused by: KrbException: Identifier doesn't match expected value (906)

    at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)

    at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)

    at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)

    ... 28 more



    The error said that no valid server credential, but I’ve add those
    credentials in Resource Manager node. The keytab result is as following:

    line@hadoopRM:~$ klist -k -e -t /etc/krb5.keytab

    Keytab name: WRFILE:/etc/krb5.keytab

    KVNO Timestamp         Principal

    ---- -----------------
    --------------------------------------------------------

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)



    The whole node manager log is attached.



    Any idea is appreciated.

    Thanks

    Emma
  • Emma Lin at Jan 20, 2012 at 5:53 am
    Vinod,
    Thanks for your point. I'm trying to do it. Will let you know the result soon.
    Thanks
    Emma

    -----Original Message-----
    From: Vinod Kumar Vavilapalli
    Sent: 2012年1月20日 13:23
    To: common-user@hadoop.apache.org
    Subject: Re: Issues during setting up hadoop security cluster

    Hi,

    Just today evening, I happened to run into someone who had the same
    issue. After some debugging, I cornered that to the hostnames having
    upper-case characters. Somehow, when DataNode or NodeManager try to
    get a service ticket for their corresponding services (NameNode and
    ResourceManager respectively), the hostname were getting converted
    into all lowercase. You can see if it is the same situation with you
    by looking at krb5kdc logs.

    If that is the case, changing the hostnames everywhere to be all
    small-case may help. Please try that and let me know.

    HTH,
    +Vinod

    On Thu, Jan 19, 2012 at 8:52 PM, Emma Lin wrote:
    Gurus,

    I’m setting up a security cluster of hadoop .23. But now, the communication
    between Data Node and Name Node, Node Manager and Resource Manager have
    problem.

    When I start the Node Manager, it will report following error, and then
    shutdown itself. Did you ever see such issue? Do you have any idea on how to
    triage this issue?



    2012-01-20 12:03:08,258 INFO  ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.server.api.ResourceTracker

    2012-01-20 12:03:08,291 INFO  nodemanager.NodeStatusUpdaterImpl
    (NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to
    ResourceManager at hadoopRM.example.aurora:9003

    2012-01-20 12:03:20,399 WARN  ipc.Client (Client.java:run(526)) - Couldn't
    setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    2012-01-20 12:03:20,405 ERROR service.CompositeService
    (CompositeService.java:start(72)) - Error starting services
    org.apache.hadoop.yarn.server.nodemanager.NodeManager

    org.apache.avro.AvroRuntimeException:
    java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)

    at
    org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)

    Caused by: java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)

    ... 3 more

    Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed
    on local exception: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)

    at $Proxy14.registerNodeManager(Unknown Source)

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)

    ... 5 more

    Caused by: java.io.IOException: Failed on local exception:
    java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)

    at org.apache.hadoop.ipc.Client.call(Client.java:1089)

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)

    ... 7 more

    Caused by: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:527)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)

    at
    org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)

    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)

    at org.apache.hadoop.ipc.Client.call(Client.java:1065)

    ... 8 more

    Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by
    GSSException: No valid credentials provided (Mechanism level: Server not
    found in Kerberos database (7) - UNKNOWN_SERVER)]

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)

    at
    org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)

    at
    org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:407)

    at
    org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:205)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:576)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:573)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:572)

    ... 11 more

    Caused by: GSSException: No valid credentials provided (Mechanism level:
    Server not found in Kerberos database (7) - UNKNOWN_SERVER)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)

    ... 20 more

    Caused by: KrbException: Server not found in Kerberos database (7) -
    UNKNOWN_SERVER

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)

    at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)

    at
    sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)

    at
    sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)

    at
    sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)

    ... 23 more

    Caused by: KrbException: Identifier doesn't match expected value (906)

    at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)

    at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)

    at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)

    ... 28 more



    The error said that no valid server credential, but I’ve add those
    credentials in Resource Manager node. The keytab result is as following:

    line@hadoopRM:~$ klist -k -e -t /etc/krb5.keytab

    Keytab name: WRFILE:/etc/krb5.keytab

    KVNO Timestamp         Principal

    ---- -----------------
    --------------------------------------------------------

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)



    The whole node manager log is attached.



    Any idea is appreciated.

    Thanks

    Emma
  • Emma Lin at Jan 20, 2012 at 10:33 am
    After remove the upper-case, the problem disappeared. Now I get node manager connected to resource manager successfully.
    Thank you Vinod.

    But now, I get another issue to connect Name Node from Data Node. The log in Name Node is as following:
    2012-01-20 18:17:02,127 WARN ipc.Server (Server.java:saslReadAndProcess(1070)) - Auth failed for 10.112.127.14:60456:null
    2012-01-20 18:17:02,128 INFO ipc.Server (Server.java:doRead(572)) - IPC Server listener on 9000: readAndProcess threw exception javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)] from client 10.112.127.14. Count of bytes read: 0
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]
    at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:159)
    at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1054)
    at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1232)
    at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:567)
    at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:366)
    at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:341)
    Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)
    at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:741)
    at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323)
    at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267)
    at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:137)
    ... 5 more
    Caused by: KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled
    at sun.security.krb5.EncryptionKey.findKey(EncryptionKey.java:481)
    at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:260)
    at sun.security.krb5.KrbApReq.(InitSecContextToken.java:79)
    at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:724)
    ... 8 more

    From the internet, someone said that it's because the Java support AES 128 by default. And to support AES 256, we need to install unlimited JCE policy. But after install the JCE, node manager can connect to resource manager, the data node still cannot connect to name node.
    As the datanode is started through jsvc, I don't know if the java setting does not work after executed through jsvc. But anyway, it still complain for the AES 256 is not supported.

    Any ideas?
    Thanks
    Emma


    -----Original Message-----
    From: Vinod Kumar Vavilapalli
    Sent: 2012年1月20日 13:23
    To: common-user@hadoop.apache.org
    Subject: Re: Issues during setting up hadoop security cluster

    Hi,

    Just today evening, I happened to run into someone who had the same
    issue. After some debugging, I cornered that to the hostnames having
    upper-case characters. Somehow, when DataNode or NodeManager try to
    get a service ticket for their corresponding services (NameNode and
    ResourceManager respectively), the hostname were getting converted
    into all lowercase. You can see if it is the same situation with you
    by looking at krb5kdc logs.

    If that is the case, changing the hostnames everywhere to be all
    small-case may help. Please try that and let me know.

    HTH,
    +Vinod

    On Thu, Jan 19, 2012 at 8:52 PM, Emma Lin wrote:
    Gurus,

    I’m setting up a security cluster of hadoop .23. But now, the communication
    between Data Node and Name Node, Node Manager and Resource Manager have
    problem.

    When I start the Node Manager, it will report following error, and then
    shutdown itself. Did you ever see such issue? Do you have any idea on how to
    triage this issue?



    2012-01-20 12:03:08,258 INFO  ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.server.api.ResourceTracker

    2012-01-20 12:03:08,291 INFO  nodemanager.NodeStatusUpdaterImpl
    (NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to
    ResourceManager at hadoopRM.example.aurora:9003

    2012-01-20 12:03:20,399 WARN  ipc.Client (Client.java:run(526)) - Couldn't
    setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    2012-01-20 12:03:20,405 ERROR service.CompositeService
    (CompositeService.java:start(72)) - Error starting services
    org.apache.hadoop.yarn.server.nodemanager.NodeManager

    org.apache.avro.AvroRuntimeException:
    java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)

    at
    org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)

    Caused by: java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)

    ... 3 more

    Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed
    on local exception: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)

    at $Proxy14.registerNodeManager(Unknown Source)

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)

    ... 5 more

    Caused by: java.io.IOException: Failed on local exception:
    java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)

    at org.apache.hadoop.ipc.Client.call(Client.java:1089)

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)

    ... 7 more

    Caused by: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:527)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)

    at
    org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)

    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)

    at org.apache.hadoop.ipc.Client.call(Client.java:1065)

    ... 8 more

    Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by
    GSSException: No valid credentials provided (Mechanism level: Server not
    found in Kerberos database (7) - UNKNOWN_SERVER)]

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)

    at
    org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)

    at
    org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:407)

    at
    org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:205)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:576)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:573)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:572)

    ... 11 more

    Caused by: GSSException: No valid credentials provided (Mechanism level:
    Server not found in Kerberos database (7) - UNKNOWN_SERVER)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)

    ... 20 more

    Caused by: KrbException: Server not found in Kerberos database (7) -
    UNKNOWN_SERVER

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)

    at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)

    at
    sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)

    at
    sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)

    at
    sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)

    ... 23 more

    Caused by: KrbException: Identifier doesn't match expected value (906)

    at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)

    at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)

    at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)

    ... 28 more



    The error said that no valid server credential, but I’ve add those
    credentials in Resource Manager node. The keytab result is as following:

    line@hadoopRM:~$ klist -k -e -t /etc/krb5.keytab

    Keytab name: WRFILE:/etc/krb5.keytab

    KVNO Timestamp         Principal

    ---- -----------------
    --------------------------------------------------------

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)



    The whole node manager log is attached.



    Any idea is appreciated.

    Thanks

    Emma
  • Vinod Kumar Vavilapalli at Jan 20, 2012 at 7:29 pm
    You are on the right path for sure.

    Where are you updating the JCE policy jar? (I know the RM-NM case is
    working after this, so just checking)

    May be the datanodes are not using the same JRE that you updated with
    the new policy jar? Can you check that? jsvc shouldn't cause any more
    issues, it should be related to your JAVA_HOME in case of datanode.

    Thanks,
    +Vinod
    On Fri, Jan 20, 2012 at 2:33 AM, Emma Lin wrote:
    After remove the upper-case, the problem disappeared. Now I get node manager connected to resource manager successfully.
    Thank you Vinod.

    But now, I get another issue to connect Name Node from Data Node. The log in Name Node is as following:
    2012-01-20 18:17:02,127 WARN  ipc.Server (Server.java:saslReadAndProcess(1070)) - Auth failed for 10.112.127.14:60456:null
    2012-01-20 18:17:02,128 INFO  ipc.Server (Server.java:doRead(572)) - IPC Server listener on 9000: readAndProcess threw exception javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)] from client 10.112.127.14. Count of bytes read: 0
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]
    at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:159)
    at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1054)
    at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1232)
    at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:567)
    at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:366)
    at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:341)
    Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)
    at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:741)
    at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323)
    at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267)
    at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:137)
    ... 5 more
    Caused by: KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled
    at sun.security.krb5.EncryptionKey.findKey(EncryptionKey.java:481)
    at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:260)
    at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:134)
    at sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:79)
    at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:724)
    ... 8 more

    From the internet, someone said that it's because the Java support AES 128 by default. And to support AES 256, we need to install unlimited JCE policy. But after install the JCE, node manager can connect to resource manager, the data node still cannot connect to name node.
    As the datanode is started through jsvc, I don't know if the java setting does not work after executed through jsvc. But anyway, it still complain for the AES 256 is not supported.

    Any ideas?
    Thanks
    Emma


    -----Original Message-----
    From: Vinod Kumar Vavilapalli
    Sent: 2012年1月20日 13:23
    To: common-user@hadoop.apache.org
    Subject: Re: Issues during setting up hadoop security cluster

    Hi,

    Just today evening, I happened to run into someone who had the same
    issue. After some debugging, I cornered that to the hostnames having
    upper-case characters. Somehow, when DataNode or NodeManager try to
    get a service ticket for their corresponding services (NameNode and
    ResourceManager respectively), the hostname were getting converted
    into all lowercase. You can see if it is the same situation with you
    by looking at krb5kdc logs.

    If that is the case, changing the hostnames everywhere to be all
    small-case may help. Please try that and let me know.

    HTH,
    +Vinod

    On Thu, Jan 19, 2012 at 8:52 PM, Emma Lin wrote:
    Gurus,

    I’m setting up a security cluster of hadoop .23. But now, the communication
    between Data Node and Name Node, Node Manager and Resource Manager have
    problem.

    When I start the Node Manager, it will report following error, and then
    shutdown itself. Did you ever see such issue? Do you have any idea on how to
    triage this issue?



    2012-01-20 12:03:08,258 INFO  ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.server.api.ResourceTracker

    2012-01-20 12:03:08,291 INFO  nodemanager.NodeStatusUpdaterImpl
    (NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to
    ResourceManager at hadoopRM.example.aurora:9003

    2012-01-20 12:03:20,399 WARN  ipc.Client (Client.java:run(526)) - Couldn't
    setup connection for nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    2012-01-20 12:03:20,405 ERROR service.CompositeService
    (CompositeService.java:start(72)) - Error starting services
    org.apache.hadoop.yarn.server.nodemanager.NodeManager

    org.apache.avro.AvroRuntimeException:
    java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)

    at
    org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)

    Caused by: java.lang.reflect.UndeclaredThrowableException

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)

    at
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)

    ... 3 more

    Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed
    on local exception: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)

    at $Proxy14.registerNodeManager(Unknown Source)

    at
    org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)

    ... 5 more

    Caused by: java.io.IOException: Failed on local exception:
    java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA; Host Details : local host is:
    "hadoopNM/10.112.127.102"; destination host is:
    ""hadoopRM.example.aurora":9003;

    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)

    at org.apache.hadoop.ipc.Client.call(Client.java:1089)

    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)

    ... 7 more

    Caused by: java.io.IOException: Couldn't setup connection for
    nm/hadoopNM.example.aurora@EXAMPLE.AURORA to
    rm/hadoopRM.example.aurora@EXAMPLE.AURORA

    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:527)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)

    at
    org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)

    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)

    at org.apache.hadoop.ipc.Client.call(Client.java:1065)

    ... 8 more

    Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by
    GSSException: No valid credentials provided (Mechanism level: Server not
    found in Kerberos database (7) - UNKNOWN_SERVER)]

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)

    at
    org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)

    at
    org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:407)

    at
    org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:205)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:576)

    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:573)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)

    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:572)

    ... 11 more

    Caused by: GSSException: No valid credentials provided (Mechanism level:
    Server not found in Kerberos database (7) - UNKNOWN_SERVER)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)

    at
    sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)

    at
    com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)

    ... 20 more

    Caused by: KrbException: Server not found in Kerberos database (7) -
    UNKNOWN_SERVER

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)

    at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)

    at
    sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)

    at
    sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)

    at
    sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)

    at
    sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)

    ... 23 more

    Caused by: KrbException: Identifier doesn't match expected value (906)

    at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)

    at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)

    at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)

    at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)

    ... 28 more



    The error said that no valid server credential, but I’ve add those
    credentials in Resource Manager node. The keytab result is as following:

    line@hadoopRM:~$ klist -k -e -t /etc/krb5.keytab

    Keytab name: WRFILE:/etc/krb5.keytab

    KVNO Timestamp         Principal

    ---- -----------------
    --------------------------------------------------------

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/20/12 10:55:02 rm/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:19:11 host/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (aes256-cts-hmac-sha1-96)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (arcfour-hmac)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des3-cbc-sha1)

    2 01/19/12 11:20:15 jhs/hadoopRM.example.aurora@EXAMPLE.AURORA
    (des-cbc-crc)



    The whole node manager log is attached.



    Any idea is appreciated.

    Thanks

    Emma

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 20, '12 at 4:53a
activeJan 20, '12 at 7:29p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase