FAQ
Hi,

I had a single node installation of CDH 4.2 with CM. I added another node
and made it a 2 node cluster.

I added a few instances of services on the new node such Datanode,
tasktracker etc.

However, when i tried to start the cluster, the Zookeeper fails to start.

The error shown is Command timed out after 150 seconds.

The last few lines of the log (/var/log/cloudera-scm-agent) are as follows:

*[02/Apr/2013 18:25:18 +0000] 2113 MainThread agent ERROR Could
not contact supervisor.*
*Traceback (most recent call last):*
* File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 517, in
get_supervisor_data*
* supervisor_info = sup.getAllProcessInfo()*
* File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__*
* return self.__send(self.__name, args)*
* File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request*
* verbose=self.__verbose*
* File
"/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0a12-py2.6.egg/supervisor/xmlrpc.py",
line 461, in request*
* self.connection.request('POST', handler, request_body, self.headers)*
* File "/usr/lib64/python2.6/httplib.py", line 914, in request*
* self._send_request(method, url, body, headers)*
* File "/usr/lib64/python2.6/httplib.py", line 951, in _send_request*
* self.endheaders()*
* File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders*
* self._send_output()*
* File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output*
* self.send(msg)*
* File "/usr/lib64/python2.6/httplib.py", line 739, in send*
* self.connect()*
* File "/usr/lib64/python2.6/httplib.py", line 720, in connect*
* self.timeout)*
* File "/usr/lib64/python2.6/socket.py", line 567, in create_connection*
* raise error, msg*
*error: [Errno 111] Connection refused*
*[02/Apr/2013 18:25:18 +0000] 2113 MainThread agent ERROR Failed
to contact supervisor after 390 attempts. Agent will exit.*
*[02/Apr/2013 18:25:18 +0000] 2113 MainThread agent ERROR Caught
unexpected exception in main loop.*
*Traceback (most recent call last):*
* File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 405, in start*
* heartbeat = self.prepare_heartbeat()*
* File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 563, in
prepare_heartbeat*
* supervisor_data = self.get_supervisor_data(True)*
* File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 540, in
get_supervisor_data*
* sys.exit(-1)*
*SystemExit: -1*

I'm unable to figure out anything. Any help will be greatly appreciated!

Thank you,
Sachin

Search Discussions

  • Philip Zeyliger at Apr 2, 2013 at 5:16 pm
    Looks like the supervisord exited. Could you see if
    /var/log/cloudera-scm-agent/supervisord.log has any hints? Restarting the
    agent (service cloudera-scm-agent hard_restart) should do the trick too.

    -- Philip

    On Tue, Apr 2, 2013 at 6:10 AM, wrote:

    Hi,

    I had a single node installation of CDH 4.2 with CM. I added another node
    and made it a 2 node cluster.

    I added a few instances of services on the new node such Datanode,
    tasktracker etc.

    However, when i tried to start the cluster, the Zookeeper fails to start.

    The error shown is Command timed out after 150 seconds.

    The last few lines of the log (/var/log/cloudera-scm-agent) are as follows:

    *[02/Apr/2013 18:25:18 +0000] 2113 MainThread agent ERROR Could
    not contact supervisor.*
    *Traceback (most recent call last):*
    * File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 517, in
    get_supervisor_data*
    * supervisor_info = sup.getAllProcessInfo()*
    * File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__*
    * return self.__send(self.__name, args)*
    * File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request*
    * verbose=self.__verbose*
    * File
    "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0a12-py2.6.egg/supervisor/xmlrpc.py",
    line 461, in request*
    * self.connection.request('POST', handler, request_body, self.headers)*
    * File "/usr/lib64/python2.6/httplib.py", line 914, in request*
    * self._send_request(method, url, body, headers)*
    * File "/usr/lib64/python2.6/httplib.py", line 951, in _send_request*
    * self.endheaders()*
    * File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders*
    * self._send_output()*
    * File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output*
    * self.send(msg)*
    * File "/usr/lib64/python2.6/httplib.py", line 739, in send*
    * self.connect()*
    * File "/usr/lib64/python2.6/httplib.py", line 720, in connect*
    * self.timeout)*
    * File "/usr/lib64/python2.6/socket.py", line 567, in create_connection*
    * raise error, msg*
    *error: [Errno 111] Connection refused*
    *[02/Apr/2013 18:25:18 +0000] 2113 MainThread agent ERROR
    Failed to contact supervisor after 390 attempts. Agent will exit.*
    *[02/Apr/2013 18:25:18 +0000] 2113 MainThread agent ERROR
    Caught unexpected exception in main loop.*
    *Traceback (most recent call last):*
    * File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 405, in start*
    * heartbeat = self.prepare_heartbeat()*
    * File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 563, in
    prepare_heartbeat*
    * supervisor_data = self.get_supervisor_data(True)*
    * File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 540, in
    get_supervisor_data*
    * sys.exit(-1)*
    *SystemExit: -1*

    I'm unable to figure out anything. Any help will be greatly appreciated!

    Thank you,
    Sachin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedApr 2, '13 at 1:10p
activeApr 2, '13 at 5:16p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Philip Zeyliger: 1 post Sachin Hadoop: 1 post

People

Translate

site design / logo © 2022 Grokbase