Hi.
We have 30+ node cluster. I've performed upgrades: 4.2 -> 4.5 -> ... -> 4.6
Problems:
1.
After upgrade is perfomed, HostInspector always complains that Cloudera
Manager Agent version mismatch. cloudera daemons version is ok(!)
I use puppet to restart cloudera-scm-agent on all hosts, then re-run
HostInspector and it says that cloudera-scm-agent version is ok.
2.
Last upgrade was from CM 4.5.3 -> 4.6
Two hosts couldn't be restarted. python process did occupy port #9000 and
didn't allow supervisord to restart.
I've killed thart process and manually executed
sudo /etc/init.d/cloudera-scm-agent hard_restart
It helped.
Now there are no problems. Just FYI.
*
*
*Here is cloudera-scm-agent.log*
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report:
(<cmf.monitor.tasktracker.tasktrackermonitor object@0x36107d0>,)
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report:
(<cmf.monitor.failovercontroller.FailoverControllerMonitor object at
0x3966d50>,)
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report: (<cmf.monitor.namenode.NameNodeMonitor
object at 0x373b190>,)
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report: (<cmf.monitor.datanode.DataNodeMonitor
object at 0x398e090>,)
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report:
(<cmf.monitor.journalnode.journalnodemonitor object@0x3610c50>,)
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter firehoses INFO
Creating a connection to the ACTIVITYMONITOR.
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter firehoses INFO
Creating a connection to the SERVICEMONITOR.
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter throttling_logger
ERROR Error sending messages to firehose:
mgmt1-SERVICEMONITOR-107a27c5a1fc39a5cb9ce9ea1947e296
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 70, in _send
self._port)
File
"/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py",
line 471, in __init__
self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter firehoses INFO
Creating a connection to the HOSTMONITOR.
[06/Jun/2013 12:22:57 +0000] 57919 TaskTrackerAttemptMonitor tasktracker
ERROR TaskTracker at http://127.0.0.1:4867 is not responding: <urlopen
error [Errno 111] Connection refused>.
[06/Jun/2013 12:22:57 +0000] 57919 TaskTrackerAttemptMonitor tasktracker
INFO Further attempts to connect will be made, but no repeat error
messages logged until Thu Jun 6 12:52:57 2013.
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth2' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth2' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth3' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth3' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth6' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth6' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth7' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth7' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
INFO NIC iface bond0 doesn't support ETHTOOL (95)
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
INFO NIC iface bond1 doesn't support ETHTOOL (95)
[06/Jun/2013 12:23:29 +0000] 57919 MonitorDaemon-Reporter throttling_logger
ERROR Error sending messages to firehose:
mgmt1-HOSTMONITOR-107a27c5a1fc39a5cb9ce9ea1947e296
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 70, in _send
self._port)
File
"/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py",
line 471, in __init__
self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
[06/Jun/2013 12:23:33 +0000] 57919 CP Server Thread-4 _cplogging INFO
10.66.49.132 - - [06/Jun/2013:12:23:33] "GET /heartbeat HTTP/1.1" 200 2 ""
"NING/1.0"
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Deleting process 2335-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Retiring process 2335-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Activating Process 2619-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO Created
/var/run/cloudera-scm-agent/process/2619-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chowning /var/run/cloudera-scm-agent/process/2619-host-inspector to root
(0) root (0)
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chmod'ing /var/run/cloudera-scm-agent/process/2619-host-inspector to 0751
[06/Jun/2013 12:23:34 +0000] 57919 MainThread util INFO
Extracted 1 files and 0 dirs to
/var/run/cloudera-scm-agent/process/2619-host-inspector.
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO Created
/var/run/cloudera-scm-agent/process/2619-host-inspector/logs
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chowning /var/run/cloudera-scm-agent/process/2619-host-inspector/logs to
root (0) root (0)
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chmod'ing /var/run/cloudera-scm-agent/process/2619-host-inspector/logs to
0751
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Triggering supervisord update.
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:23:37 +0000] 57919 CP Server Thread-5 _cplogging INFO
10.66.49.132 - - [06/Jun/2013:12:23:37] "GET
/process/2619-host-inspector/files/inspector HTTP/1.1" 200 1422 ""
"Java/1.6.0_37"
[06/Jun/2013 12:23:52 +0000] 57919 MainThread agent INFO Process
with same id has changed: 2619-host-inspector.
[06/Jun/2013 12:23:52 +0000] 57919 MainThread agent INFO
Deactivating process 2619-host-inspector
[06/Jun/2013 12:24:29 +0000] 57919 Monitor-HostMonitor throttling_logger
ERROR Child process still around for /dfs/sharedNN. Skipping new
collection.
[06/Jun/2013 12:24:29 +0000] 57919 Monitor-HostMonitor throttling_logger
ERROR Child process still around for /dev/shm. Skipping new collection.
[devops@prod-node015 kill_me_scm_agent]$ cat cloudera-scm-agent.out
/usr/lib64/cmf/agent/src/cmf/agent.py:31: DeprecationWarning: the sha
module is deprecated; use the hashlib module instead
import sha
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO SCM
Agent Version: 4.6.0
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Using
directory: /var/run/cloudera-scm-agent
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Using
supervisor binary path:
/usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Adding
env vars that start with CMF_AGENT_
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Logging
to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
/usr/lib64/cmf/agent/src/cmf/agent.py:565: DeprecationWarning:
psutil.used_phymem is deprecated; use psutil.phymem_usage instead
really_used = psutil.used_phymem() - cached - buffers
+ source_parcel_environment
+ '[' '!' -z
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/meta/cdh_env.sh ']'
+ OLD_IFS='
'
+ IFS=:
+ SCRIPT_ARRAY=($SCM_DEFINES_SCRIPTS)
+ DIRNAME_ARRAY=($PARCEL_DIRNAMES)
+ IFS='
'
+ COUNT=1
++ seq 1 1
+ for i in '`seq 1 $COUNT`'
+ SCRIPT=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/meta/cdh_env.sh
+ PARCEL_DIRNAME=CDH-4.3.0-1.cdh4.3.0.p0.22
+ . /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/meta/cdh_env.sh
++ CDH_DIRNAME=CDH-4.3.0-1.cdh4.3.0.p0.22
++ export
CDH_HADOOP_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop
++
CDH_HADOOP_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop
++ export
CDH_MR1_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce
++
CDH_MR1_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce
++ export
CDH_HDFS_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs
++
CDH_HDFS_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs
++ export
CDH_HTTPFS_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-httpfs
++
CDH_HTTPFS_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-httpfs
++ export
CDH_MR2_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-mapreduce
++
CDH_MR2_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-mapreduce
++ export
CDH_YARN_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-yarn
++
CDH_YARN_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-yarn
++ export
CDH_HBASE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase
++ CDH_HBASE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase
++ export
CDH_ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/zookeeper
++
CDH_ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/zookeeper
++ export
CDH_HIVE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hive
++ CDH_HIVE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hive
++ export
CDH_HUE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/share/hue
++ CDH_HUE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/share/hue
++ export
CDH_OOZIE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/oozie
++ CDH_OOZIE_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/oozie
++ export
CDH_HUE_PLUGINS_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop
++
CDH_HUE_PLUGINS_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop
++ export
CDH_FLUME_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/flume-ng
++
CDH_FLUME_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/flume-ng
++ export
CDH_PIG_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/pig
++ CDH_PIG_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/pig
++ export
CDH_HCAT_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hcatalog
++
CDH_HCAT_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hcatalog
++ export
CDH_SQOOP2_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop2
++
CDH_SQOOP2_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop2
++ export
TOMCAT_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/bigtop-tomcat
++
TOMCAT_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/bigtop-tomcat
++ export
JSVC_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/bigtop-utils
++
JSVC_HOME=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/bigtop-utils
++ export
CDH_HADOOP_BIN=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/bin/hadoop
++
CDH_HADOOP_BIN=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/bin/hadoop
++ export
HIVE_DEFAULT_XML=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hive/conf/hive-default.xml
++
HIVE_DEFAULT_XML=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hive/conf/hive-default.xml
+ env
/usr/lib64/cmf/agent/src/cmf/monitor/host/__init__.py:170:
DeprecationWarning: psutil.used_phymem is deprecated; use
psutil.phymem_usage instead
really_used = psutil.used_phymem() - cached - buffers
*monitord.log*
06/Jun/2013 12:22:33 +0000] 57919 MainThread util INFO
Extracted 8 files and 0 dirs to
/var/run/cloudera-scm-agent/process/2268-hdfs-DATANODE.
[06/Jun/2013 12:22:33 +0000] 57919 MainThread agent INFO
Re-using pre-existing directory:
/var/run/cloudera-scm-agent/process/2268-hdfs-DATANODE/logs
[06/Jun/2013 12:22:33 +0000] 57919 MainThread agent INFO
Triggering supervisord update.
[06/Jun/2013 12:22:33 +0000] 57919 MainThread abstract_monitor INFO
Refreshing DataNodeMonitor for None
[06/Jun/2013 12:22:33 +0000] 57919 MainThread __init__ INFO New
monitor: (<cmf.monitor.datanode.datanodemonitor object@0x398e090>,)
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:22:33 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report:
(<cmf.monitor.tasktracker.tasktrackermonitor object@0x36107d0>,)
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report:
(<cmf.monitor.failovercontroller.FailoverControllerMonitor object at
0x3966d50>,)
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report: (<cmf.monitor.namenode.NameNodeMonitor
object at 0x373b190>,)
[06/Jun/2013 12:22:55 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report: (<cmf.monitor.datanode.DataNodeMonitor
object at 0x398e090>,)
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Scheduler __init__
INFO Monitor ready to report:
(<cmf.monitor.journalnode.journalnodemonitor object@0x3610c50>,)
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter firehoses INFO
Creating a connection to the ACTIVITYMONITOR.
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter firehoses INFO
Creating a connection to the SERVICEMONITOR.
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter throttling_logger
ERROR Error sending messages to firehose:
mgmt1-SERVICEMONITOR-107a27c5a1fc39a5cb9ce9ea1947e296
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 70, in _send
self._port)
File
"/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py",
line 471, in __init__
self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
[06/Jun/2013 12:22:56 +0000] 57919 MonitorDaemon-Reporter firehoses INFO
Creating a connection to the HOSTMONITOR.
[06/Jun/2013 12:22:57 +0000] 57919 TaskTrackerAttemptMonitor tasktracker
ERROR TaskTracker at http://127.0.0.1:4867 is not responding: <urlopen
error [Errno 111] Connection refused>.
[06/Jun/2013 12:22:57 +0000] 57919 TaskTrackerAttemptMonitor tasktracker
INFO Further attempts to connect will be made, but no repeat error
messages logged until Thu Jun 6 12:52:57 2013.
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth2' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth2' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth3' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth3' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth6' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth6' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth7' reported unknown speed '65535'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
WARNING Interface 'eth7' reported unknown duplex mode '255'
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
INFO NIC iface bond0 doesn't support ETHTOOL (95)
[06/Jun/2013 12:23:29 +0000] 57919 Monitor-HostMonitor network_interfaces
INFO NIC iface bond1 doesn't support ETHTOOL (95)
[06/Jun/2013 12:23:29 +0000] 57919 MonitorDaemon-Reporter throttling_logger
ERROR Error sending messages to firehose:
mgmt1-HOSTMONITOR-107a27c5a1fc39a5cb9ce9ea1947e296
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 70, in _send
self._port)
File
"/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py",
line 471, in __init__
self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
[06/Jun/2013 12:23:33 +0000] 57919 CP Server Thread-4 _cplogging INFO
10.66.49.132 - - [06/Jun/2013:12:23:33] "GET /heartbeat HTTP/1.1" 200 2 ""
"NING/1.0"
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Deleting process 2335-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Retiring process 2335-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Activating Process 2619-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO Created
/var/run/cloudera-scm-agent/process/2619-host-inspector
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chowning /var/run/cloudera-scm-agent/process/2619-host-inspector to root
(0) root (0)
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chmod'ing /var/run/cloudera-scm-agent/process/2619-host-inspector to 0751
[06/Jun/2013 12:23:34 +0000] 57919 MainThread util INFO
Extracted 1 files and 0 dirs to
/var/run/cloudera-scm-agent/process/2619-host-inspector.
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO Created
/var/run/cloudera-scm-agent/process/2619-host-inspector/logs
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chowning /var/run/cloudera-scm-agent/process/2619-host-inspector/logs to
root (0) root (0)
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Chmod'ing /var/run/cloudera-scm-agent/process/2619-host-inspector/logs to
0751
[06/Jun/2013 12:23:34 +0000] 57919 MainThread agent INFO
Triggering supervisord update.
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: IMPALA-1.0-1.p0.371
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.3.0-1.cdh4.3.0.p0.22
[06/Jun/2013 12:23:34 +0000] 57919 MainThread parcel INFO Loading
parcel manifest for: CDH-4.2.1-1.cdh4.2.1.p0.5
[06/Jun/2013 12:23:37 +0000] 57919 CP Server Thread-5 _cplogging INFO
10.66.49.132 - - [06/Jun/2013:12:23:37] "GET
/process/2619-host-inspector/files/inspector HTTP/1.1" 200 1422 ""
"Java/1.6.0_37"
[06/Jun/2013 12:23:52 +0000] 57919 MainThread agent INFO Process
with same id has changed: 2619-host-inspector.
[06/Jun/2013 12:23:52 +0000] 57919 MainThread agent INFO
Deactivating process 2619-host-inspector
[06/Jun/2013 12:24:29 +0000] 57919 Monitor-HostMonitor throttling_logger
ERROR Child process still around for /dfs/sharedNN. Skipping new
collection.
[06/Jun/2013 12:24:29 +0000] 57919 Monitor-HostMonitor throttling_logger
ERROR Child process still around for /dev/shm. Skipping new collection.
[devops@prod-node015 kill_me_scm_agent]$ cat cloudera-scm-agent.out
/usr/lib64/cmf/agent/src/cmf/agent.py:31: DeprecationWarning: the sha
module is deprecated; use the hashlib module instead
import sha
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO SCM
Agent Version: 4.6.0
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Using
directory: /var/run/cloudera-scm-agent
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Using
supervisor binary path:
/usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Adding
env vars that start with CMF_AGENT_
[06/Jun/2013 12:22:22 +0000] 57919 MainThread agent INFO Logging
to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
/usr/lib64/cmf/agent/src/cmf/agent.py:565: DeprecationWarning:
psutil.used_phymem is deprecated; use psutil.phymem_usage instead
really_used = psutil.used_phymem() - cached - buffers
+ source_parcel_environment
+ '[' '!' -z
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/meta/cdh_env.sh ']'
+ OLD_IFS='
'
+ IFS=:
+ SCRIPT_ARRAY=($SCM_DEFINES_SCRIPTS)
+ DIRNAME_ARRAY=($PARCEL_DIRNAMES)
+ IFS='
'
*cmflistener*
013-05-30 13:03:35,739 INFO spawned: '2118-hdfs-FAILOVERCONTROLLER' with
pid 33417
2013-05-30 13:03:55,572 INFO success: 2088-hdfs-JOURNALNODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-05-30 13:03:55,573 INFO success: 2120-hdfs-NAMENODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-05-30 13:03:56,575 INFO success: 2118-hdfs-FAILOVERCONTROLLER entered
RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2013-05-30 13:03:56,575 INFO success: 2110-hdfs-DATANODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-05-30 13:04:12,633 INFO spawned: '2146-mapreduce-TASKTRACKER' with pid
34156
2013-05-30 13:04:33,240 INFO success: 2146-mapreduce-TASKTRACKER entered
RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2013-05-31 16:37:08,407 INFO spawned: '2175-host-inspector' with pid 55156
2013-05-31 16:37:08,408 INFO success: 2175-host-inspector entered RUNNING
state, process has stayed up for > than 0 seconds (startsecs)
2013-05-31 16:37:09,482 INFO exited: 2175-host-inspector (exit status 0;
expected)
2013-05-31 16:41:40,736 INFO spawned: '2208-host-inspector' with pid 55772
2013-05-31 16:41:40,738 INFO success: 2208-host-inspector entered RUNNING
state, process has stayed up for > than 0 seconds (startsecs)
2013-05-31 16:41:40,771 INFO exited: cmflistener (exit status 1; not
expected)
2013-05-31 16:41:40,779 INFO spawned: 'cmflistener' with pid 55781
2013-05-31 16:41:41,898 INFO success: cmflistener entered RUNNING state,
process has stayed up for > than 1 seconds (startsecs)
2013-05-31 16:41:42,147 INFO exited: 2208-host-inspector (exit status 0;
expected)
2013-05-31 16:42:36,107 INFO stopped: 2146-mapreduce-TASKTRACKER (exit
status 143)
2013-05-31 16:42:38,692 INFO stopped: 2118-hdfs-FAILOVERCONTROLLER (exit
status 143)
2013-05-31 16:42:39,157 INFO stopped: 2088-hdfs-JOURNALNODE (exit status
143)
2013-05-31 16:42:40,642 INFO stopped: 2120-hdfs-NAMENODE (exit status 143)
2013-05-31 16:42:42,112 INFO stopped: 2110-hdfs-DATANODE (exit status 143)
2013-05-31 16:43:33,899 INFO spawned: '2276-hdfs-FAILOVERCONTROLLER' with
pid 56048
2013-05-31 16:43:34,066 INFO spawned: '2278-hdfs-NAMENODE' with pid 56112
2013-05-31 16:43:34,241 INFO spawned: '2246-hdfs-JOURNALNODE' with pid 56183
2013-05-31 16:43:34,424 INFO spawned: '2268-hdfs-DATANODE' with pid 56253
2013-05-31 16:43:54,722 INFO success: 2276-hdfs-FAILOVERCONTROLLER entered
RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2013-05-31 16:43:54,722 INFO success: 2268-hdfs-DATANODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-05-31 16:43:54,722 INFO success: 2278-hdfs-NAMENODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-05-31 16:43:54,722 INFO success: 2246-hdfs-JOURNALNODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-05-31 16:44:11,677 INFO spawned: '2304-mapreduce-TASKTRACKER' with pid
56990
2013-05-31 16:44:32,412 INFO success: 2304-mapreduce-TASKTRACKER entered
RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2013-06-06 11:54:37,576 INFO spawned: '2335-host-inspector' with pid 55062
2013-06-06 11:54:37,579 INFO success: 2335-host-inspector entered RUNNING
state, process has stayed up for > than 0 seconds (startsecs)
2013-06-06 11:54:39,110 INFO exited: 2335-host-inspector (exit status 0;
expected)
2013-06-06 12:19:41,305 WARN received SIGTERM indicating exit request
2013-06-06 12:19:41,305 INFO waiting for 2276-hdfs-FAILOVERCONTROLLER,
2268-hdfs-DATANODE, cmflistener, 2304-mapreduce-TASKTRACKER,
2278-hdfs-NAMENODE, 2246-hdfs-JOURNALNODE to die
2013-06-06 12:19:42,632 INFO stopped: 2246-hdfs-JOURNALNODE (exit status
143)
2013-06-06 12:19:42,982 INFO stopped: 2278-hdfs-NAMENODE (exit status 143)
2013-06-06 12:19:44,987 INFO waiting for 2276-hdfs-FAILOVERCONTROLLER,
2268-hdfs-DATANODE, cmflistener, 2304-mapreduce-TASKTRACKER to die
2013-06-06 12:19:47,993 INFO waiting for 2276-hdfs-FAILOVERCONTROLLER,
2268-hdfs-DATANODE, cmflistener, 2304-mapreduce-TASKTRACKER to die
2013-06-06 12:19:48,323 INFO stopped: 2304-mapreduce-TASKTRACKER (exit
status 143)
2013-06-06 12:19:51,330 INFO waiting for 2276-hdfs-FAILOVERCONTROLLER,
2268-hdfs-DATANODE, cmflistener to die
2013-06-06 12:19:52,658 INFO stopped: 2268-hdfs-DATANODE (exit status 143)
2013-06-06 12:19:54,029 INFO stopped: 2276-hdfs-FAILOVERCONTROLLER (exit
status 143)
2013-06-06 12:19:54,031 INFO stopped: cmflistener (terminated by SIGTERM)
2013-06-06 12:20:28,854 CRIT Supervisor running as root (no user in config
file)
2013-06-06 12:20:28,930 INFO RPC interface 'supervisor' initialized
2013-06-06 12:20:28,930 INFO RPC interface 'supervisor' initialized
2013-06-06 12:20:28,932 INFO daemonizing the supervisord process
2013-06-06 12:20:28,933 INFO supervisord started with pid 57838
2013-06-06 12:20:29,937 INFO spawned: 'cmflistener' with pid 57839
2013-06-06 12:20:31,322 INFO success: cmflistener entered RUNNING state,
process has stayed up for > than 1 seconds (startsecs)
2013-06-06 12:20:31,339 INFO exited: cmflistener (exit status 1; not
expected)
2013-06-06 12:20:32,343 INFO spawned: 'cmflistener' with pid 57847
2013-06-06 12:20:33,345 INFO success: cmflistener entered RUNNING state,
process has stayed up for > than 1 seconds (startsecs)
2013-06-06 12:22:20,868 WARN received SIGTERM indicating exit request
2013-06-06 12:22:20,869 INFO waiting for cmflistener to die
2013-06-06 12:22:20,871 INFO stopped: cmflistener (terminated by SIGTERM)
2013-06-06 12:22:27,221 CRIT Supervisor running as root (no user in config
file)
2013-06-06 12:22:27,286 INFO RPC interface 'supervisor' initialized
2013-06-06 12:22:27,286 INFO RPC interface 'supervisor' initialized
2013-06-06 12:22:27,288 INFO daemonizing the supervisord process
2013-06-06 12:22:27,289 INFO supervisord started with pid 57945
2013-06-06 12:22:28,293 INFO spawned: 'cmflistener' with pid 57946
2013-06-06 12:22:29,580 INFO success: cmflistener entered RUNNING state,
process has stayed up for > than 1 seconds (startsecs)
2013-06-06 12:22:32,656 INFO spawned: '2304-mapreduce-TASKTRACKER' with pid
58261
2013-06-06 12:22:32,775 INFO spawned: '2276-hdfs-FAILOVERCONTROLLER' with
pid 58311
2013-06-06 12:22:32,889 INFO spawned: '2278-hdfs-NAMENODE' with pid 58356
2013-06-06 12:22:33,022 INFO spawned: '2246-hdfs-JOURNALNODE' with pid 58454
2013-06-06 12:22:33,156 INFO spawned: '2268-hdfs-DATANODE' with pid 58559
2013-06-06 12:22:53,283 INFO success: 2276-hdfs-FAILOVERCONTROLLER entered
RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2013-06-06 12:22:53,284 INFO success: 2268-hdfs-DATANODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-06-06 12:22:53,284 INFO success: 2278-hdfs-NAMENODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-06-06 12:22:53,284 INFO success: 2246-hdfs-JOURNALNODE entered RUNNING
state, process has stayed up for > than 20 seconds (startsecs)
2013-06-06 12:22:53,284 INFO success: 2304-mapreduce-TASKTRACKER entered
RUNNING state, process has stayed up for > than 20 seconds (startsecs)
2013-06-06 12:23:34,078 INFO spawned: '2619-host-inspector' with pid 59866
2013-06-06 12:23:34,080 INFO success: 2619-host-inspector entered RUNNING
state, process has stayed up for > than 0 seconds (startsecs)
2013-06-06 12:23:36,519 INFO exited: 2619-host-inspector (exit status 0;
expected)
[de