FAQ
Hi,
I have a cluster of 4 debian squeeze machines, on all of them I
installed the same version ( hadoop-0.20.2.tar.gz )

I have : n-0 namenode, n-1: jobtracker and n-{0,1,2,3} slaves
but you can see all my configs in more detail @
http://pastie.org/1754875

the machines have 3GiB RAM.
I don't think disk space is an issue anywhere, but FWIW:
/var/bomvl (which would house hdfs stuff) is a filesystem of 140GB on
all nodes. they all have 1.8GB free on /


so, I invoke the cluster on n-0:

$ bin/hadoop namenode -format
11/04/04 18:27:33 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = n-0.map-reduce.search.wall2.test/10.2.1.90
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
11/04/04 18:27:33 INFO namenode.FSNamesystem: fsOwner=dplaetin,Search,root
11/04/04 18:27:33 INFO namenode.FSNamesystem: supergroup=supergroup
11/04/04 18:27:33 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/04/04 18:27:33 INFO common.Storage: Image file of size 98 saved in 0 seconds.
11/04/04 18:27:34 INFO common.Storage: Storage directory /var/bomvl/name has been successfully formatted.
11/04/04 18:27:34 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at n-0.map-reduce.search.wall2.test/10.2.1.90
************************************************************/

$ bin/start-all.sh
starting namenode, logging to /var/log/hadoop/hadoop-dplaetin-namenode-n-0.map-reduce.search.wall2.test.out
n-0: starting datanode, logging to /var/log/hadoop/hadoop-dplaetin-datanode-n-0.map-reduce.search.wall2.test.out
n-2: starting datanode, logging to /var/log/hadoop/hadoop-dplaetin-datanode-n-2.map-reduce.search.wall2.test.out
n-3: starting datanode, logging to /var/log/hadoop/hadoop-dplaetin-datanode-n-3.map-reduce.search.wall2.test.out
n-1: starting datanode, logging to /var/log/hadoop/hadoop-dplaetin-datanode-n-1.map-reduce.search.wall2.test.out
localhost: starting secondarynamenode, logging to /var/log/hadoop/hadoop-dplaetin-secondarynamenode-n-0.map-reduce.search.wall2.test.out
starting jobtracker, logging to /var/log/hadoop/hadoop-dplaetin-jobtracker-n-0.map-reduce.search.wall2.test.out
n-2: starting tasktracker, logging to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-2.map-reduce.search.wall2.test.out
n-3: starting tasktracker, logging to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-3.map-reduce.search.wall2.test.out
n-0: starting tasktracker, logging to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-0.map-reduce.search.wall2.test.out
n-1: starting tasktracker, logging to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-1.map-reduce.search.wall2.test.out


However, in n-0:/var/log/hadoop/hadoop-dplaetin-namenode-n-0.map-reduce.search.wall2.test.log,
I see this:
2011-04-04 18:27:34,760 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = n-0.map-reduce.search.wall2.test/10.2.1.90
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2011-04-04 18:27:34,846 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000
2011-04-04 18:27:34,851 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: n-0-lan0/10.1.1.2:9000
2011-04-04 18:27:34,853 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-04-04 18:27:34,854 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-04-04 18:27:34,893 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=dplaetin,Search,root
2011-04-04 18:27:34,893 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-04-04 18:27:34,893 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-04-04 18:27:34,899 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-04-04 18:27:34,900 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2011-04-04 18:27:34,927 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
2011-04-04 18:27:34,931 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0
2011-04-04 18:27:34,931 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 98 loaded in 0 seconds.
2011-04-04 18:27:34,932 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /var/bomvl/name/current/edits of size 4 edits # 0 loaded in 0 seconds.
2011-04-04 18:27:34,971 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 98 saved in 0 seconds.
2011-04-04 18:27:35,146 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 263 msecs
2011-04-04 18:27:35,147 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0
2011-04-04 18:27:35,147 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0
2011-04-04 18:27:35,147 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 0
2011-04-04 18:27:35,147 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated blocks = 0
2011-04-04 18:27:35,148 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 0 secs.
2011-04-04 18:27:35,148 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
2011-04-04 18:27:35,148 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
2011-04-04 18:27:40,247 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2011-04-04 18:27:40,299 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070
2011-04-04 18:27:40,300 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
2011-04-04 18:27:40,300 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070
2011-04-04 18:27:40,300 INFO org.mortbay.log: jetty-6.1.14
2011-04-04 18:27:40,518 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50070
2011-04-04 18:27:40,518 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at: 0.0.0.0:50070
2011-04-04 18:27:40,521 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2011-04-04 18:27:40,521 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting
2011-04-04 18:27:40,522 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting
2011-04-04 18:27:40,522 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting
2011-04-04 18:27:40,522 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting
2011-04-04 18:27:40,526 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000: starting
2011-04-04 18:27:40,526 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting
2011-04-04 18:27:40,526 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000: starting
2011-04-04 18:27:40,526 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting
2011-04-04 18:27:40,527 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000: starting
2011-04-04 18:27:40,528 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting
2011-04-04 18:27:40,527 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000: starting
2011-04-04 18:27:40,550 INFO org.apache.hadoop.ipc.Server: Error
register getProtocolVersion
java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersion
at org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
at org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.(MetricsTimeVaryingRate.java:99)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2011-04-04 18:27:40,550 INFO org.apache.hadoop.ipc.Server: Error register getProtocolVersion
java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersion
at org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
at org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.(MetricsTimeVaryingRate.java:99)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2011-04-04 18:27:40,550 INFO org.apache.hadoop.ipc.Server: Error register getProtocolVersion
java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersion
at org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
at org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.(MetricsTimeVaryingRate.java:99)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2011-04-04 18:29:20,803 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 10.1.1.5:50010 storage DS-1679058086-10.2.0.80-50010-1301934560735
2011-04-04 18:29:20,806 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.1.1.5:50010
2011-04-04 18:29:32,077 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 10.1.1.3:50010 storage DS-195369122-10.2.1.91-50010-1301934572032
2011-04-04 18:29:32,077 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.1.1.3:50010
2011-04-04 18:29:39,321 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 10.1.1.4:50010 storage DS-564465257-10.2.1.79-50010-1301934579273
2011-04-04 18:29:39,321 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.1.1.4:50010
2011-04-04 18:30:58,007 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 10.1.1.2:50010 storage DS-166622494-10.2.1.90-50010-1301934658005
2011-04-04 18:30:58,007 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.1.1.2:50010
2011-04-04 18:33:13,788 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.1.1.2
2011-04-04 18:33:13,788 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2011-04-04 18:33:14,091 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage from 10.1.1.2
2011-04-04 18:33:14,092 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of
transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 33


Anyone knows what could be the issue at hand? The suggestions I could find in other threads are "check diskspace" and "make sure the versions are the same",
but on the former I'm good, the latter is not an issue here.

When I try to put a file on dfhs, I get:
011-04-04 17:10:28,306 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call addBlock(/tmp/hadoop-dplaetin/mapred/system/jobtracker.info, DFSClient_371874060) from 10.1.1.3:47491: error: java.io.IOException: File /tmp/hadoop-dplaetin/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-dplaetin/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


On n-1, hadoop-dplaetin-datanode-n-1.map-reduce.search.wall2.test.log looks good:
$ cat hadoop-dplaetin-datanode-n-1.map-reduce.search.wall2.test.log
2011-04-04 18:27:35,886 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = n-1.map-reduce.search.wall2.test/10.2.1.91
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2011-04-04 18:27:40,535 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /var/bomvl/data is not formatted.
2011-04-04 18:27:40,535 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2011-04-04 18:27:40,828 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2011-04-04 18:27:40,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
2011-04-04 18:27:40,831 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2011-04-04 18:27:45,876 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2011-04-04 18:27:45,940 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2011-04-04 18:27:45,940 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2011-04-04 18:27:45,940 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2011-04-04 18:27:45,940 INFO org.mortbay.log: jetty-6.1.14
2011-04-04 18:29:27,006 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075
2011-04-04 18:29:27,011 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null
2011-04-04 18:29:32,026 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020
2011-04-04 18:29:32,029 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2011-04-04 18:29:32,029 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2011-04-04 18:29:32,031 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2011-04-04 18:29:32,031 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2011-04-04 18:29:32,031 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2011-04-04 18:29:32,031 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(n144-17a.wall2.test:50010, storageID=, infoPort=50075, ipcPort=50020)
2011-04-04 18:29:32,061 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id DS-195369122-10.2.1.91-50010-1301934572032 is assigned to data-node 10.1.1.3:50010
2011-04-04 18:29:32,061 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.1.3:50010, storageID=DS-195369122-10.2.1.91-50010-1301934572032, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/var/bomvl/data/current'}
2011-04-04 18:29:32,062 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2011-04-04 18:29:32,070 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 2 msecs
2011-04-04 18:29:32,071 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
block scanner.

However, this doesn't seem very right:
$ head -n 40 hadoop-dplaetin-tasktracker-n-1.map-reduce.search.wall2.test.log
2011-04-04 18:27:39,041 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG: host = n-1.map-reduce.search.wall2.test/10.2.1.91
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2011-04-04 18:27:44,176 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2011-04-04 18:27:44,263 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50060
2011-04-04 18:27:44,269 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50060 webServer.getConnectors()[0].getLocalPort() returned 50060
2011-04-04 18:27:44,269 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50060
2011-04-04 18:27:44,269 INFO org.mortbay.log: jetty-6.1.14
2011-04-04 18:28:05,858 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50060
2011-04-04 18:28:05,863 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
2011-04-04 18:28:05,872 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=TaskTracker, port=42914
2011-04-04 18:28:05,908 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2011-04-04 18:28:05,908 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 42914: starting
2011-04-04 18:28:05,909 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 42914: starting
2011-04-04 18:28:05,909 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 42914: starting
2011-04-04 18:28:05,909 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 42914: starting
2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 42914: starting
2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 42914: starting
2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 42914: starting
2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 42914: starting
2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 42914: starting
2011-04-04 18:28:05,910 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:42914
2011-04-04 18:28:05,910 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_n144-17a.wall2.test:localhost/127.0.0.1:42914
2011-04-04 18:28:06,933 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 0 time(s).
2011-04-04 18:28:07,934 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 1 time(s).
2011-04-04 18:28:08,934 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 2 time(s).
2011-04-04 18:28:09,935 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 3 time(s).
2011-04-04 18:28:10,935 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 4 time(s).
2011-04-04 18:28:11,935 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 5 time(s).
2011-04-04 18:28:12,936 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 6 time(s).
2011-04-04 18:28:13,936 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 7 time(s).
2011-04-04 18:28:14,937 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 8 time(s).
2011-04-04 18:28:15,937 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: n-1/10.1.1.3:9001. Already tried 9 time(s).
2011-04-04 18:28:15,938 INFO org.apache.hadoop.ipc.RPC: Server at n-1/10.1.1.3:9001 not available yet, Zzzzz...
2011-04-04 18:28:17,939 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: n-1/10.1.1.3:9001. Already tried 0 time(s).

Indeed, the jobtracker does not seem to be running (although that's probably another problem):
$ sudo netstat -n -l -p | grep 9001
$

Thanks for any help,
Dieter

Search Discussions

  • Dieter Plaetinck at Apr 11, 2011 at 10:08 am
    Anyone? Anyone at all?
    I figured out the issue with the jobtracker, but I still have the
    errors:

    * Error register getProtocolVersion
    * File (..) could only be replicated to 0 nodes, instead of 1

    as explained in my first mail.

    The 2nd error can appear without _any_ errors in _any_ of the datanode
    or tasktracker logs. And the NameNode webinterface even tells me all
    nodes are live, none are dead.

    This is effectively holding me back from using the cluster,
    I'm completely in the dark, I find this very frustrating. :(

    Thank you,

    Dieter

    On Mon, 4 Apr 2011 18:45:49 +0200
    Dieter Plaetinck wrote:
    Hi,
    I have a cluster of 4 debian squeeze machines, on all of them I
    installed the same version ( hadoop-0.20.2.tar.gz )

    I have : n-0 namenode, n-1: jobtracker and n-{0,1,2,3} slaves
    but you can see all my configs in more detail @
    http://pastie.org/1754875

    the machines have 3GiB RAM.
    I don't think disk space is an issue anywhere, but FWIW:
    /var/bomvl (which would house hdfs stuff) is a filesystem of 140GB on
    all nodes. they all have 1.8GB free on /


    so, I invoke the cluster on n-0:

    $ bin/hadoop namenode -format
    11/04/04 18:27:33 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG: host = n-0.map-reduce.search.wall2.test/10.2.1.90
    STARTUP_MSG: args = [-format]
    STARTUP_MSG: version = 0.20.2
    STARTUP_MSG: build =
    https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
    -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
    ************************************************************/
    11/04/04 18:27:33 INFO namenode.FSNamesystem:
    fsOwner=dplaetin,Search,root 11/04/04 18:27:33 INFO
    namenode.FSNamesystem: supergroup=supergroup 11/04/04 18:27:33 INFO
    namenode.FSNamesystem: isPermissionEnabled=true 11/04/04 18:27:33
    INFO common.Storage: Image file of size 98 saved in 0 seconds.
    11/04/04 18:27:34 INFO common.Storage: Storage
    directory /var/bomvl/name has been successfully formatted. 11/04/04
    18:27:34 INFO namenode.NameNode:
    SHUTDOWN_MSG: /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at
    n-0.map-reduce.search.wall2.test/10.2.1.90
    ************************************************************/

    $ bin/start-all.sh
    starting namenode, logging
    to /var/log/hadoop/hadoop-dplaetin-namenode-n-0.map-reduce.search.wall2.test.out
    n-0: starting datanode, logging
    to /var/log/hadoop/hadoop-dplaetin-datanode-n-0.map-reduce.search.wall2.test.out
    n-2: starting datanode, logging
    to /var/log/hadoop/hadoop-dplaetin-datanode-n-2.map-reduce.search.wall2.test.out
    n-3: starting datanode, logging
    to /var/log/hadoop/hadoop-dplaetin-datanode-n-3.map-reduce.search.wall2.test.out
    n-1: starting datanode, logging
    to /var/log/hadoop/hadoop-dplaetin-datanode-n-1.map-reduce.search.wall2.test.out
    localhost: starting secondarynamenode, logging
    to /var/log/hadoop/hadoop-dplaetin-secondarynamenode-n-0.map-reduce.search.wall2.test.out
    starting jobtracker, logging
    to /var/log/hadoop/hadoop-dplaetin-jobtracker-n-0.map-reduce.search.wall2.test.out
    n-2: starting tasktracker, logging
    to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-2.map-reduce.search.wall2.test.out
    n-3: starting tasktracker, logging
    to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-3.map-reduce.search.wall2.test.out
    n-0: starting tasktracker, logging
    to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-0.map-reduce.search.wall2.test.out
    n-1: starting tasktracker, logging
    to /var/log/hadoop/hadoop-dplaetin-tasktracker-n-1.map-reduce.search.wall2.test.out


    However, in
    n-0:/var/log/hadoop/hadoop-dplaetin-namenode-n-0.map-reduce.search.wall2.test.log,
    I see this: 2011-04-04 18:27:34,760 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode:
    STARTUP_MSG: /************************************************************
    STARTUP_MSG: Starting NameNode STARTUP_MSG: host =
    n-0.map-reduce.search.wall2.test/10.2.1.90 STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.20.2
    STARTUP_MSG: build =
    https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
    -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
    ************************************************************/
    2011-04-04 18:27:34,846 INFO
    org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics
    with hostName=NameNode, port=9000 2011-04-04 18:27:34,851 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
    n-0-lan0/10.1.1.2:9000 2011-04-04 18:27:34,853 INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
    with processName=NameNode, sessionId=null 2011-04-04 18:27:34,854
    INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
    Initializing NameNodeMeterics using context
    object:org.apache.hadoop.metrics.spi.NullContext 2011-04-04
    18:27:34,893 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    fsOwner=dplaetin,Search,root 2011-04-04 18:27:34,893 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    supergroup=supergroup 2011-04-04 18:27:34,893 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    isPermissionEnabled=true 2011-04-04 18:27:34,899 INFO
    org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
    Initializing FSNamesystemMetrics using context
    object:org.apache.hadoop.metrics.spi.NullContext 2011-04-04
    18:27:34,900 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
    FSNamesystemStatusMBean 2011-04-04 18:27:34,927 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
    2011-04-04 18:27:34,931 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Number of files under
    construction = 0 2011-04-04 18:27:34,931 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Image file of size 98
    loaded in 0 seconds. 2011-04-04 18:27:34,932 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Edits
    file /var/bomvl/name/current/edits of size 4 edits # 0 loaded in 0
    seconds. 2011-04-04 18:27:34,971 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Image file of size 98
    saved in 0 seconds. 2011-04-04 18:27:35,146 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
    FSImage in 263 msecs 2011-04-04 18:27:35,147 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
    blocks = 0 2011-04-04 18:27:35,147 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    invalid blocks = 0 2011-04-04 18:27:35,147 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    under-replicated blocks = 0 2011-04-04 18:27:35,147 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    over-replicated blocks = 0 2011-04-04 18:27:35,148 INFO
    org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 0
    secs. 2011-04-04 18:27:35,148 INFO
    org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0
    racks and 0 datanodes 2011-04-04 18:27:35,148 INFO
    org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has
    0 blocks 2011-04-04 18:27:40,247 INFO org.mortbay.log: Logging to
    org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
    org.mortbay.log.Slf4jLog 2011-04-04 18:27:40,299 INFO
    org.apache.hadoop.http.HttpServer: Port returned by
    webServer.getConnectors()[0].getLocalPort() before open() is -1.
    Opening the listener on 50070 2011-04-04 18:27:40,300 INFO
    org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned
    50070 webServer.getConnectors()[0].getLocalPort() returned 50070
    2011-04-04 18:27:40,300 INFO org.apache.hadoop.http.HttpServer: Jetty
    bound to port 50070 2011-04-04 18:27:40,300 INFO org.mortbay.log:
    jetty-6.1.14 2011-04-04 18:27:40,518 INFO org.mortbay.log: Started
    SelectChannelConnector@0.0.0.0:50070 2011-04-04 18:27:40,518 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
    0.0.0.0:50070 2011-04-04 18:27:40,521 INFO
    org.apache.hadoop.ipc.Server: IPC Server Responder: starting
    2011-04-04 18:27:40,521 INFO org.apache.hadoop.ipc.Server: IPC Server
    listener on 9000: starting 2011-04-04 18:27:40,522 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting
    2011-04-04 18:27:40,522 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 1 on 9000: starting 2011-04-04 18:27:40,522 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting
    2011-04-04 18:27:40,526 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 3 on 9000: starting 2011-04-04 18:27:40,526 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting
    2011-04-04 18:27:40,526 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 5 on 9000: starting 2011-04-04 18:27:40,526 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting
    2011-04-04 18:27:40,527 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 8 on 9000: starting 2011-04-04 18:27:40,528 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting
    2011-04-04 18:27:40,527 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 7 on 9000: starting 2011-04-04 18:27:40,550 INFO
    org.apache.hadoop.ipc.Server: Error register getProtocolVersion
    java.lang.IllegalArgumentException: Duplicate
    metricsName:getProtocolVersion at
    org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
    at
    org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:89)
    at
    org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:99)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at
    java.security.AccessController.doPrivileged(Native Method) at
    javax.security.auth.Subject.doAs(Subject.java:396) at
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2011-04-04
    18:27:40,550 INFO org.apache.hadoop.ipc.Server: Error register
    getProtocolVersion java.lang.IllegalArgumentException: Duplicate
    metricsName:getProtocolVersion at
    org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
    at
    org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:89)
    at
    org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:99)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at
    java.security.AccessController.doPrivileged(Native Method) at
    javax.security.auth.Subject.doAs(Subject.java:396) at
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2011-04-04
    18:27:40,550 INFO org.apache.hadoop.ipc.Server: Error register
    getProtocolVersion java.lang.IllegalArgumentException: Duplicate
    metricsName:getProtocolVersion at
    org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
    at
    org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:89)
    at
    org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:99)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at
    java.security.AccessController.doPrivileged(Native Method) at
    javax.security.auth.Subject.doAs(Subject.java:396) at
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2011-04-04
    18:29:20,803 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.registerDatanode: node registration from 10.1.1.5:50010
    storage DS-1679058086-10.2.0.80-50010-1301934560735 2011-04-04
    18:29:20,806 INFO org.apache.hadoop.net.NetworkTopology: Adding a new
    node: /default-rack/10.1.1.5:50010 2011-04-04 18:29:32,077 INFO
    org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.registerDatanode: node registration from 10.1.1.3:50010
    storage DS-195369122-10.2.1.91-50010-1301934572032 2011-04-04
    18:29:32,077 INFO org.apache.hadoop.net.NetworkTopology: Adding a new
    node: /default-rack/10.1.1.3:50010 2011-04-04 18:29:39,321 INFO
    org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.registerDatanode: node registration from 10.1.1.4:50010
    storage DS-564465257-10.2.1.79-50010-1301934579273 2011-04-04
    18:29:39,321 INFO org.apache.hadoop.net.NetworkTopology: Adding a new
    node: /default-rack/10.1.1.4:50010 2011-04-04 18:30:58,007 INFO
    org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.registerDatanode: node registration from 10.1.1.2:50010
    storage DS-166622494-10.2.1.90-50010-1301934658005 2011-04-04
    18:30:58,007 INFO org.apache.hadoop.net.NetworkTopology: Adding a new
    node: /default-rack/10.1.1.2:50010 2011-04-04 18:33:13,788 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
    from 10.1.1.2 2011-04-04 18:33:13,788 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    transactions: 0 Total time for transactions(ms): 0Number of
    transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
    2011-04-04 18:33:14,091 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage
    from 10.1.1.2 2011-04-04 18:33:14,092 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    transactions: 0 Total time for transactions(ms): 0Number of
    transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 33


    Anyone knows what could be the issue at hand? The suggestions I could
    find in other threads are "check diskspace" and "make sure the
    versions are the same", but on the former I'm good, the latter is not
    an issue here.

    When I try to put a file on dfhs, I get:
    011-04-04 17:10:28,306 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 1 on 9000, call
    addBlock(/tmp/hadoop-dplaetin/mapred/system/jobtracker.info,
    DFSClient_371874060) from 10.1.1.3:47491: error: java.io.IOException:
    File /tmp/hadoop-dplaetin/mapred/system/jobtracker.info could only be
    replicated to 0 nodes, instead of 1 java.io.IOException:
    File /tmp/hadoop-dplaetin/mapred/system/jobtracker.info could only be
    replicated to 0 nodes, instead of 1 at
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597) at
    org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at
    java.security.AccessController.doPrivileged(Native Method) at
    javax.security.auth.Subject.doAs(Subject.java:396) at
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


    On n-1, hadoop-dplaetin-datanode-n-1.map-reduce.search.wall2.test.log
    looks good: $ cat
    hadoop-dplaetin-datanode-n-1.map-reduce.search.wall2.test.log
    2011-04-04 18:27:35,886 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode:
    STARTUP_MSG: /************************************************************
    STARTUP_MSG: Starting DataNode STARTUP_MSG: host =
    n-1.map-reduce.search.wall2.test/10.2.1.91 STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build =
    https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
    -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
    ************************************************************/
    2011-04-04 18:27:40,535 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Storage
    directory /var/bomvl/data is not formatted. 2011-04-04 18:27:40,535
    INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
    2011-04-04 18:27:40,828 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Registered
    FSDatasetStatusMBean 2011-04-04 18:27:40,829 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server
    at 50010 2011-04-04 18:27:40,831 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith
    is 1048576 bytes/s 2011-04-04 18:27:45,876 INFO org.mortbay.log:
    Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
    org.mortbay.log.Slf4jLog 2011-04-04 18:27:45,940 INFO
    org.apache.hadoop.http.HttpServer: Port returned by
    webServer.getConnectors()[0].getLocalPort() before open() is -1.
    Opening the listener on 50075 2011-04-04 18:27:45,940 INFO
    org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned
    50075 webServer.getConnectors()[0].getLocalPort() returned 50075
    2011-04-04 18:27:45,940 INFO org.apache.hadoop.http.HttpServer: Jetty
    bound to port 50075 2011-04-04 18:27:45,940 INFO org.mortbay.log:
    jetty-6.1.14 2011-04-04 18:29:27,006 INFO org.mortbay.log: Started
    SelectChannelConnector@0.0.0.0:50075 2011-04-04 18:29:27,011 INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
    with processName=DataNode, sessionId=null 2011-04-04 18:29:32,026
    INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC
    Metrics with hostName=DataNode, port=50020 2011-04-04 18:29:32,029
    INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
    2011-04-04 18:29:32,029 INFO org.apache.hadoop.ipc.Server: IPC Server
    listener on 50020: starting 2011-04-04 18:29:32,031 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
    2011-04-04 18:29:32,031 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 1 on 50020: starting 2011-04-04 18:29:32,031 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
    2011-04-04 18:29:32,031 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration =
    DatanodeRegistration(n144-17a.wall2.test:50010, storageID=,
    infoPort=50075, ipcPort=50020) 2011-04-04 18:29:32,061 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id
    DS-195369122-10.2.1.91-50010-1301934572032 is assigned to data-node
    10.1.1.3:50010 2011-04-04 18:29:32,061 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode:
    DatanodeRegistration(10.1.1.3:50010,
    storageID=DS-195369122-10.2.1.91-50010-1301934572032, infoPort=50075,
    ipcPort=50020)In DataNode.run, data =
    FSDataset{dirpath='/var/bomvl/data/current'} 2011-04-04 18:29:32,062
    INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using
    BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec 2011-04-04
    18:29:32,070 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
    BlockReport of 0 blocks got processed in 2 msecs 2011-04-04
    18:29:32,071 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
    Starting Periodic block scanner.

    However, this doesn't seem very right:
    $ head -n 40
    hadoop-dplaetin-tasktracker-n-1.map-reduce.search.wall2.test.log
    2011-04-04 18:27:39,041 INFO org.apache.hadoop.mapred.TaskTracker:
    STARTUP_MSG: /************************************************************
    STARTUP_MSG: Starting TaskTracker STARTUP_MSG: host =
    n-1.map-reduce.search.wall2.test/10.2.1.91 STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.20.2
    STARTUP_MSG: build =
    https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
    -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
    ************************************************************/
    2011-04-04 18:27:44,176 INFO org.mortbay.log: Logging to
    org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
    org.mortbay.log.Slf4jLog 2011-04-04 18:27:44,263 INFO
    org.apache.hadoop.http.HttpServer: Port returned by
    webServer.getConnectors()[0].getLocalPort() before open() is -1.
    Opening the listener on 50060 2011-04-04 18:27:44,269 INFO
    org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned
    50060 webServer.getConnectors()[0].getLocalPort() returned 50060
    2011-04-04 18:27:44,269 INFO org.apache.hadoop.http.HttpServer: Jetty
    bound to port 50060 2011-04-04 18:27:44,269 INFO org.mortbay.log:
    jetty-6.1.14 2011-04-04 18:28:05,858 INFO org.mortbay.log: Started
    SelectChannelConnector@0.0.0.0:50060 2011-04-04 18:28:05,863 INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
    with processName=TaskTracker, sessionId= 2011-04-04 18:28:05,872 INFO
    org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics
    with hostName=TaskTracker, port=42914 2011-04-04 18:28:05,908 INFO
    org.apache.hadoop.ipc.Server: IPC Server Responder: starting
    2011-04-04 18:28:05,908 INFO org.apache.hadoop.ipc.Server: IPC Server
    listener on 42914: starting 2011-04-04 18:28:05,909 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 0 on 42914: starting
    2011-04-04 18:28:05,909 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 1 on 42914: starting 2011-04-04 18:28:05,909 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 2 on 42914: starting
    2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 3 on 42914: starting 2011-04-04 18:28:05,910 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 4 on 42914: starting
    2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 5 on 42914: starting 2011-04-04 18:28:05,910 INFO
    org.apache.hadoop.ipc.Server: IPC Server handler 6 on 42914: starting
    2011-04-04 18:28:05,910 INFO org.apache.hadoop.ipc.Server: IPC Server
    handler 7 on 42914: starting 2011-04-04 18:28:05,910 INFO
    org.apache.hadoop.mapred.TaskTracker: TaskTracker up at:
    localhost/127.0.0.1:42914 2011-04-04 18:28:05,910 INFO
    org.apache.hadoop.mapred.TaskTracker: Starting tracker
    tracker_n144-17a.wall2.test:localhost/127.0.0.1:42914 2011-04-04
    18:28:06,933 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 0 time(s). 2011-04-04
    18:28:07,934 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 1 time(s). 2011-04-04
    18:28:08,934 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 2 time(s). 2011-04-04
    18:28:09,935 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 3 time(s). 2011-04-04
    18:28:10,935 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 4 time(s). 2011-04-04
    18:28:11,935 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 5 time(s). 2011-04-04
    18:28:12,936 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 6 time(s). 2011-04-04
    18:28:13,936 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 7 time(s). 2011-04-04
    18:28:14,937 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 8 time(s). 2011-04-04
    18:28:15,937 INFO org.apache.hadoop.ipc.Client: Retrying connect to
    server: n-1/10.1.1.3:9001. Already tried 9 time(s). 2011-04-04
    18:28:15,938 INFO org.apache.hadoop.ipc.RPC: Server at
    n-1/10.1.1.3:9001 not available yet, Zzzzz... 2011-04-04 18:28:17,939
    INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
    n-1/10.1.1.3:9001. Already tried 0 time(s).

    Indeed, the jobtracker does not seem to be running (although that's
    probably another problem): $ sudo netstat -n -l -p | grep 9001
    $

    Thanks for any help,
    Dieter

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 4, '11 at 4:49p
activeApr 11, '11 at 10:08a
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Dieter Plaetinck: 2 posts

People

Translate

site design / logo © 2022 Grokbase