FAQ
Hello all,

When submitting an Hbase export job to YARN, I see it appearing on the web
UI but for some reason, the job never starts; it constantly stays at 0%
complete. I am using hadoop 0.23 and hbase 0.92 ( CDH4 beta 1 )

I see the NodeManagers connecting to the ResourceManager:

2012-03-15 19:36:10,585 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
torb1pnb001.dataraker.net:46696 Node Transitioned from NEW to RUNNING
2012-03-15 19:36:16,633 INFO org.apache.hadoop.yarn.util.RackResolver:
Resolved torb1pnb002.dataraker.net to /default-rack
2012-03-15 19:36:16,633 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
NodeManager from node torb1pnb002.dataraker.net(cmPort: 35665 httpPort:
9999) registered with capability: 1000, assigned nodeId
torb1pnb002.dataraker.net:35665
2012-03-15 19:36:16,634 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
torb1pnb002.dataraker.net:35665 Node Transitioned from NEW to RUNNING
[ etc... ]

and the job being submitted to the ResourceManager:

2012-03-15 19:40:29,248 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
new applicationId: 1
2012-03-15 19:40:31,323 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1331840162147_0001 State change from NEW to SUBMITTED
2012-03-15 19:40:31,323 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering appattempt_1331840162147_0001_000001
2012-03-15 19:40:31,323 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1331840162147_0001_000001 State change from NEW to SUBMITTED
2012-03-15 19:40:31,327 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
with id 1 submitted by user hdfs with application_id [..snip..]
2012-03-15 19:40:31,329 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs
IP=10.192.16.64 OPERATION=Submit Application Request
TARGET=ClientRMService RESULT=SUCCESS
APPID=application_1331840162147_0001
2012-03-15 19:40:31,333 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
Application Submission: application_1331840162147_0001 from hdfs,
currently active: 1
2012-03-15 19:40:31,336 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1331840162147_0001_000001 State change from SUBMITTED to
SCHEDULED
2012-03-15 19:40:31,336 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1331840162147_0001 State change from SUBMITTED to ACCEPTED

but after the NodeManager starts, the log never indicates any requests
from the ResourceManager

2012-03-15 19:36:16,604 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connected
to ResourceManager at torb1pna001:8025
2012-03-15 19:36:16,645 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Registered with ResourceManager as torb1pnb002.dataraker.net:35665 with
total resource of memory: 1000
2012-03-15 19:36:16,645 INFO
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is
started.
2012-03-15 19:36:16,646 INFO
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
[ end of log ]

I am seeing strange errors in Zookeeper when the job is submitted:

2012-03-15 16:58:00,216 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:33262
2012-03-15 16:58:00,219 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:33262
2012-03-15 16:58:00,229 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0071 with negotiated timeout 40000 for
client /127.0.0.1:33262
2012-03-15 16:58:48,884 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x35d53d539f0071, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-03-15 16:58:48,885 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
socket connection for client /127.0.0.1:33262 which had sessionid
0x35d53d539f0071
2012-03-15 17:02:59,968 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:59652
2012-03-15 17:02:59,971 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:59652
2012-03-15 17:02:59,978 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0072 with negotiated timeout 40000 for
client /127.0.0.1:59652
2012-03-15 17:18:39,480 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x35d53d539f0072, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-03-15 17:18:39,481 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
socket connection for client /127.0.0.1:59652 which had sessionid
0x35d53d539f0072
2012-03-15 17:48:32,406 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:40908
2012-03-15 17:48:32,409 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:40908
2012-03-15 17:48:32,420 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0073 with negotiated timeout 40000 for
client /127.0.0.1:40908
2012-03-15 18:40:19,836 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x35d53d539f0073, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-03-15 18:40:19,837 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
socket connection for client /127.0.0.1:40908 which had sessionid
0x35d53d539f0073
2012-03-15 19:40:30,503 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:45743
2012-03-15 19:40:30,506 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:45743
2012-03-15 19:40:30,518 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0074 with negotiated timeout 40000 for
client /127.0.0.1:45743

Any ideas how to debug this?

[--- yarn-site.xml ---]

<?xml version="1.0"?>
<configuration>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/nm-local-dir</value>
<description>List of directories to store localized files
in.</description>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/logs</value>
<description>Where to store container logs.</description>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1000</value>
<description>Amount of physical memory, in MB, that can be
allocated for containers.</description>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>torb1pna001:8040</value>
<description>The address of the applications manager interface in
the RM.</description>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>torb1pna001:8030</value>
<description>The address of the scheduler interface.</description>
</property>

<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>torb1pna001:8088</value>
<description>The address of the RM web application.</description>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>torb1pna001:8025</value>
<description></description>
</property>

<property>
<name>yarn.resourcemanager.admin.address</name>
<value>torb1pna001:8141</value>
<description>The address of the RM admin interface.</description>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>

[--- mapred-site.xml ---]

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>


<property>
<name>mapreduce.jobhistory.address</name>
<value>torb1pna001:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>torb1pna001:19888</value>
</property>


<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/tmp/mr-history/tmp</value>
</property>

<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/tmp/mr-history/done</value>
</property>

</configuration>

[--- output from HBase export ---]

bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export dev /backups/dev
12/03/15 17:48:30 INFO mapreduce.Export: verisons=1, starttime=0,
endtime=9223372036854775807
12/03/15 17:48:30 WARN conf.Configuration:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
12/03/15 17:48:30 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:host.name=torb1pna001.dataraker.net
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_27
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_27/jre
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/etc/hadoop/conf:/usr/<...snip...>
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/lib/hadoop/lib/native
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:os.name=Linux
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:os.arch=amd64
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-274.el5
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:user.name=hdfs
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/usr/lib/hbase
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Opening socket connection to
server /127.0.0.1:2181
12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Socket connection established
to localhost.localdomain/127.0.0.1:2181, initiating session
12/03/15 17:48:32 INFO zookeeper.RecoverableZooKeeper: The identifier of
this process is 18411@torb1pna001.dataraker.net
12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Session establishment
complete on server localhost.localdomain/127.0.0.1:2181, sessionid =
0x35d53d539f0073, negotiated timeout = 40000
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 INFO mapreduce.JobSubmitter: number of splits:73
12/03/15 17:48:32 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 17:48:33 INFO mapred.ResourceMgrDelegate: Submitted application
application_1331830579833_0004 to ResourceManager at
torb1pna001/10.192.16.64:8040
12/03/15 17:48:33 INFO mapreduce.Job: The url to track the job:
http://torb1pna001:8088/proxy/application_1331830579833_0004/
12/03/15 17:48:33 INFO mapreduce.Job: Running job: job_1331830579833_0004
bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export bch_dev
/backups/bch_dev
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
files.
12/03/15 19:40:28 INFO mapreduce.Export: verisons=1, starttime=0,
endtime=9223372036854775807
12/03/15 19:40:28 WARN conf.Configuration:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
12/03/15 19:40:28 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:host.name=torb1pna001.dataraker.net
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_27
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_27/jre
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/etc/hadoop/conf:<...snip...>
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/lib/hadoop/lib/native
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:os.name=Linux
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:os.arch=amd64
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-274.el5
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:user.name=hdfs
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/usr/lib/hbase
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Opening socket connection to
server /127.0.0.1:2181
12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Socket connection established
to localhost.localdomain/127.0.0.1:2181, initiating session
12/03/15 19:40:30 INFO zookeeper.RecoverableZooKeeper: The identifier of
this process is 19732@torb1pna001.dataraker.net
12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Session establishment
complete on server localhost.localdomain/127.0.0.1:2181, sessionid =
0x35d53d539f0074, negotiated timeout = 40000
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 INFO mapreduce.JobSubmitter: number of splits:73
12/03/15 19:40:30 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 19:40:31 INFO mapred.ResourceMgrDelegate: Submitted application
application_1331840162147_0001 to ResourceManager at
torb1pna001/10.192.16.64:8040
12/03/15 19:40:31 INFO mapreduce.Job: The url to track the job:
http://torb1pna001:8088/proxy/application_1331840162147_0001/
12/03/15 19:40:31 INFO mapreduce.Job: Running job: job_1331840162147_0001

Search Discussions

  • Harsh J at Mar 17, 2012 at 8:54 pm
    Hey Peter,

    The following log:

    2012-03-15 19:36:16,645 INFO
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
    Registered with ResourceManager as torb1pnb002.dataraker.net:35665
    with total resource of memory: 1000

    Indicates the issue. Your NM has only 1000mb to serve as resources,
    cause of your yarn.nodemanager.resource.memory-mb configuration. Your
    job, however, with AM + Task Containers demand > 1000 (1024 by
    default, see MAPREDUCE-4026). Hence the resource requests can't be
    granted by this NM, and the RM can't find another NM that may accept
    such high requests.

    Quick fix is to simply raise yarn.nodemanager.resource.memory-mb back
    to its default of 8192 (8 GB) or higher.
    On Fri, Mar 16, 2012 at 1:22 AM, Peter Naudus wrote:
    Hello all,

    When submitting an Hbase export job to YARN, I see it appearing on the web
    UI but for some reason, the job never starts; it constantly stays at 0%
    complete. I am using hadoop 0.23 and hbase 0.92 ( CDH4 beta 1 )

    I see the NodeManagers connecting to the ResourceManager:

    2012-03-15 19:36:10,585 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
    torb1pnb001.dataraker.net:46696 Node Transitioned from NEW to RUNNING
    2012-03-15 19:36:16,633 INFO org.apache.hadoop.yarn.util.RackResolver:
    Resolved torb1pnb002.dataraker.net to /default-rack
    2012-03-15 19:36:16,633 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
    NodeManager from node torb1pnb002.dataraker.net(cmPort: 35665 httpPort:
    9999) registered with capability: 1000, assigned nodeId
    torb1pnb002.dataraker.net:35665
    2012-03-15 19:36:16,634 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
    torb1pnb002.dataraker.net:35665 Node Transitioned from NEW to RUNNING
    [ etc... ]

    and the job being submitted to the ResourceManager:

    2012-03-15 19:40:29,248 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new
    applicationId: 1
    2012-03-15 19:40:31,323 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
    application_1331840162147_0001 State change from NEW to SUBMITTED
    2012-03-15 19:40:31,323 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
    Registering appattempt_1331840162147_0001_000001
    2012-03-15 19:40:31,323 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
    appattempt_1331840162147_0001_000001 State change from NEW to SUBMITTED
    2012-03-15 19:40:31,327 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
    with id 1 submitted by user hdfs with application_id [..snip..]
    2012-03-15 19:40:31,329 INFO
    org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs
    IP=10.192.16.64 OPERATION=Submit Application Request
    TARGET=ClientRMService  RESULT=SUCCESS
    APPID=application_1331840162147_0001
    2012-03-15 19:40:31,333 INFO
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
    Application Submission: application_1331840162147_0001 from hdfs, currently
    active: 1
    2012-03-15 19:40:31,336 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
    appattempt_1331840162147_0001_000001 State change from SUBMITTED to
    SCHEDULED
    2012-03-15 19:40:31,336 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
    application_1331840162147_0001 State change from SUBMITTED to ACCEPTED

    but after the NodeManager starts, the log never indicates any requests from
    the ResourceManager

    2012-03-15 19:36:16,604 INFO
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connected
    to ResourceManager at torb1pna001:8025
    2012-03-15 19:36:16,645 INFO
    org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered
    with ResourceManager as torb1pnb002.dataraker.net:35665 with total resource
    of memory: 1000
    2012-03-15 19:36:16,645 INFO org.apache.hadoop.yarn.service.AbstractService:
    Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is
    started.
    2012-03-15 19:36:16,646 INFO org.apache.hadoop.yarn.service.AbstractService:
    Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
    [ end of log ]

    I am seeing strange errors in Zookeeper when the job is submitted:

    2012-03-15 16:58:00,216 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
    Accepted socket connection from /127.0.0.1:33262
    2012-03-15 16:58:00,219 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
    attempting to establish new session at /127.0.0.1:33262
    2012-03-15 16:58:00,229 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
    Established session 0x35d53d539f0071 with negotiated timeout 40000 for
    client /127.0.0.1:33262
    2012-03-15 16:58:48,884 - WARN
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
    of stream exception
    EndOfStreamException: Unable to read additional data from client sessionid
    0x35d53d539f0071, likely client has closed socket
    at
    org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
    at
    org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
    at java.lang.Thread.run(Thread.java:662)
    2012-03-15 16:58:48,885 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
    socket connection for client /127.0.0.1:33262 which had sessionid
    0x35d53d539f0071
    2012-03-15 17:02:59,968 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
    Accepted socket connection from /127.0.0.1:59652
    2012-03-15 17:02:59,971 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
    attempting to establish new session at /127.0.0.1:59652
    2012-03-15 17:02:59,978 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
    Established session 0x35d53d539f0072 with negotiated timeout 40000 for
    client /127.0.0.1:59652
    2012-03-15 17:18:39,480 - WARN
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
    of stream exception
    EndOfStreamException: Unable to read additional data from client sessionid
    0x35d53d539f0072, likely client has closed socket
    at
    org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
    at
    org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
    at java.lang.Thread.run(Thread.java:662)
    2012-03-15 17:18:39,481 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
    socket connection for client /127.0.0.1:59652 which had sessionid
    0x35d53d539f0072
    2012-03-15 17:48:32,406 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
    Accepted socket connection from /127.0.0.1:40908
    2012-03-15 17:48:32,409 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
    attempting to establish new session at /127.0.0.1:40908
    2012-03-15 17:48:32,420 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
    Established session 0x35d53d539f0073 with negotiated timeout 40000 for
    client /127.0.0.1:40908
    2012-03-15 18:40:19,836 - WARN
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
    of stream exception
    EndOfStreamException: Unable to read additional data from client sessionid
    0x35d53d539f0073, likely client has closed socket
    at
    org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
    at
    org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
    at java.lang.Thread.run(Thread.java:662)
    2012-03-15 18:40:19,837 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
    socket connection for client /127.0.0.1:40908 which had sessionid
    0x35d53d539f0073
    2012-03-15 19:40:30,503 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
    Accepted socket connection from /127.0.0.1:45743
    2012-03-15 19:40:30,506 - INFO
    [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
    attempting to establish new session at /127.0.0.1:45743
    2012-03-15 19:40:30,518 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
    Established session 0x35d53d539f0074 with negotiated timeout 40000 for
    client /127.0.0.1:45743

    Any ideas how to debug this?

    [--- yarn-site.xml ---]

    <?xml version="1.0"?>
    <configuration>

    <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/tmp/nm-local-dir</value>
    <description>List of directories to store localized files
    in.</description>
    </property>

    <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/tmp/logs</value>
    <description>Where to store container logs.</description>
    </property>

    <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>1000</value>
    <description>Amount of physical memory, in MB, that can be allocated
    for containers.</description>
    </property>

    <property>
    <name>yarn.resourcemanager.address</name>
    <value>torb1pna001:8040</value>
    <description>The address of the applications manager interface in the
    RM.</description>
    </property>

    <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>torb1pna001:8030</value>
    <description>The address of the scheduler interface.</description>
    </property>

    <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>torb1pna001:8088</value>
    <description>The address of the RM web application.</description>
    </property>

    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>torb1pna001:8025</value>
    <description></description>
    </property>

    <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>torb1pna001:8141</value>
    <description>The address of the RM admin interface.</description>
    </property>

    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    </property>

    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    </configuration>

    [--- mapred-site.xml ---]

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>

    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>


    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>torb1pna001:10020</value>
    </property>

    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>torb1pna001:19888</value>
    </property>


    <property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>/tmp/mr-history/tmp</value>
    </property>

    <property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>/tmp/mr-history/done</value>
    </property>

    </configuration>

    [--- output from HBase export ---]

    bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export dev /backups/dev
    12/03/15 17:48:30 INFO mapreduce.Export: verisons=1, starttime=0,
    endtime=9223372036854775807
    12/03/15 17:48:30 WARN conf.Configuration: mapred.used.genericoptionsparser
    is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
    12/03/15 17:48:30 WARN conf.Configuration: io.bytes.per.checksum is
    deprecated. Instead, use dfs.bytes-per-checksum
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:host.name=torb1pna001.dataraker.net
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:java.version=1.6.0_27
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:java.vendor=Sun Microsystems Inc.
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:java.home=/usr/java/jdk1.6.0_27/jre
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:java.class.path=/etc/hadoop/conf:/usr/<...snip...>
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:java.library.path=/usr/lib/hadoop/lib/native
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:java.io.tmpdir=/tmp
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:java.compiler=<NA>
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:os.version=2.6.18-274.el5
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:user.name=hdfs
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:user.home=/home/hadoop
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
    environment:user.dir=/usr/lib/hbase
    12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Initiating client connection,
    connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
    12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Opening socket connection to
    server /127.0.0.1:2181
    12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Socket connection established
    to localhost.localdomain/127.0.0.1:2181, initiating session
    12/03/15 17:48:32 INFO zookeeper.RecoverableZooKeeper: The identifier of
    this process is 18411@torb1pna001.dataraker.net
    12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Session establishment complete
    on server localhost.localdomain/127.0.0.1:2181, sessionid =
    0x35d53d539f0073, negotiated timeout = 40000
    12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 17:48:32 INFO mapreduce.JobSubmitter: number of splits:73
    12/03/15 17:48:32 WARN conf.Configuration: io.bytes.per.checksum is
    deprecated. Instead, use dfs.bytes-per-checksum
    12/03/15 17:48:33 INFO mapred.ResourceMgrDelegate: Submitted application
    application_1331830579833_0004 to ResourceManager at
    torb1pna001/10.192.16.64:8040
    12/03/15 17:48:33 INFO mapreduce.Job: The url to track the job:
    http://torb1pna001:8088/proxy/application_1331830579833_0004/
    12/03/15 17:48:33 INFO mapreduce.Job: Running job: job_1331830579833_0004
    bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export bch_dev /backups/bch_dev
    WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
    use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
    files.
    12/03/15 19:40:28 INFO mapreduce.Export: verisons=1, starttime=0,
    endtime=9223372036854775807
    12/03/15 19:40:28 WARN conf.Configuration: mapred.used.genericoptionsparser
    is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
    12/03/15 19:40:28 WARN conf.Configuration: io.bytes.per.checksum is
    deprecated. Instead, use dfs.bytes-per-checksum
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:host.name=torb1pna001.dataraker.net
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:java.version=1.6.0_27
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:java.vendor=Sun Microsystems Inc.
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:java.home=/usr/java/jdk1.6.0_27/jre
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:java.class.path=/etc/hadoop/conf:<...snip...>
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:java.library.path=/usr/lib/hadoop/lib/native
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:java.io.tmpdir=/tmp
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:java.compiler=<NA>
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:os.version=2.6.18-274.el5
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:user.name=hdfs
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:user.home=/home/hadoop
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
    environment:user.dir=/usr/lib/hbase
    12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Initiating client connection,
    connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
    12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Opening socket connection to
    server /127.0.0.1:2181
    12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Socket connection established
    to localhost.localdomain/127.0.0.1:2181, initiating session
    12/03/15 19:40:30 INFO zookeeper.RecoverableZooKeeper: The identifier of
    this process is 19732@torb1pna001.dataraker.net
    12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Session establishment complete
    on server localhost.localdomain/127.0.0.1:2181, sessionid =
    0x35d53d539f0074, negotiated timeout = 40000
    12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
    Instead, use io.native.lib.available
    12/03/15 19:40:30 INFO mapreduce.JobSubmitter: number of splits:73
    12/03/15 19:40:30 WARN conf.Configuration: io.bytes.per.checksum is
    deprecated. Instead, use dfs.bytes-per-checksum
    12/03/15 19:40:31 INFO mapred.ResourceMgrDelegate: Submitted application
    application_1331840162147_0001 to ResourceManager at
    torb1pna001/10.192.16.64:8040
    12/03/15 19:40:31 INFO mapreduce.Job: The url to track the job:
    http://torb1pna001:8088/proxy/application_1331840162147_0001/
    12/03/15 19:40:31 INFO mapreduce.Job: Running job: job_1331840162147_0001


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 15, '12 at 7:53p
activeMar 17, '12 at 8:54p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Harsh J: 1 post Peter Naudus: 1 post

People

Translate

site design / logo © 2022 Grokbase