Apparently, the fix to my original error is because hadoop is setup for a
single local machine out of the box and i had to change these directories
<property>
<name>mapred.local.dir</name>
<value>/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/hadoop/mapred/system</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/hadoop/mapred/temp</value>
</property>
to be hdfs instead of "hadoop.tmp.dir"
So now distcp works as a non-hadoop user and mapred works as a non-hadoop
user from the name node, however, from a workstation i get this now
blue:hadoop-0.17.1 mdidomenico$ bin/hadoop distcp
"file:///Users/mdidomenico/hadoop/1gTestfile" "1gTestfile-1"
08/09/09 13:44:19 INFO util.CopyFiles:
srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
08/09/09 13:44:19 INFO util.CopyFiles: destPath=1gTestfile-1
08/09/09 13:44:20 INFO util.CopyFiles: srcCount=1
08/09/09 13:44:22 INFO mapred.JobClient: Running job: job_200809091332_0004
08/09/09 13:44:23 INFO mapred.JobClient: map 0% reduce 0%
08/09/09 13:44:31 INFO mapred.JobClient: Task Id :
task_200809091332_0004_m_000000_0, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
08/09/09 13:44:50 INFO mapred.JobClient: Task Id :
task_200809091332_0004_m_000000_1, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
08/09/09 13:45:07 INFO mapred.JobClient: Task Id :
task_200809091332_0004_m_000000_2, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
08/09/09 13:45:26 INFO mapred.JobClient: map 100% reduce 100%
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
On Tue, Sep 9, 2008 at 1:14 PM, Michael Di Domenico <
[email protected]> wrote:
manually creating the "system" directory gets me past the first error, but
now i get this. i'm not necessarily sure its a step forward though, because
the map task never shows up in the jobtracker
[
[email protected] hadoop-0.17.1]$ bin/hadoop distcp
"file:///home/mdidomenico/1gTestfile" "1gTestfile"
08/09/09 13:12:06 INFO util.CopyFiles:
srcPaths=[file:/home/mdidomenico/1gTestfile]
08/09/09 13:12:06 INFO util.CopyFiles: destPath=1gTestfile
08/09/09 13:12:07 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:07 INFO dfs.DFSClient: Abandoning block
blk_5758513071638050362
08/09/09 13:12:13 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:13 INFO dfs.DFSClient: Abandoning block
blk_1691495306775808049
08/09/09 13:12:17 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:17 INFO dfs.DFSClient: Abandoning block
blk_1027634596973755899
08/09/09 13:12:19 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:19 INFO dfs.DFSClient: Abandoning block
blk_4535302510016050282
08/09/09 13:12:23 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:23 INFO dfs.DFSClient: Abandoning block
blk_7022658012001626339
08/09/09 13:12:25 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:25 INFO dfs.DFSClient: Abandoning block
blk_-4509681241839967328
08/09/09 13:12:29 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:29 INFO dfs.DFSClient: Abandoning block
blk_8318033979013580420
08/09/09 13:12:31 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
08/09/09 13:12:31 WARN dfs.DFSClient: Error Recovery for block
blk_-4509681241839967328 bad datanode[0]
08/09/09 13:12:35 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:35 INFO dfs.DFSClient: Abandoning block
blk_2848354798649979411
08/09/09 13:12:41 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
08/09/09 13:12:41 WARN dfs.DFSClient: Error Recovery for block
blk_2848354798649979411 bad datanode[0]
Exception in thread "Thread-0" java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(Unknown
Source)
at java.util.TreeMap$KeyIterator.next(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217)
at
org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214)
at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224)
at
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209)
08/09/09 13:12:41 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:41 INFO dfs.DFSClient: Abandoning block
blk_9189111926428577428
On Tue, Sep 9, 2008 at 1:03 PM, Michael Di Domenico <
[email protected]> wrote:
a little more digging and it appears i cannot run distcp as someone other
then hadoop on the namenode
/tmp/hadoop-hadoop/mapred/system/job_200809091231_0005/job.xml
looking at this directory from the error file the "system" directory does
not exist on the namenode, i only have a "local" directory
On Tue, Sep 9, 2008 at 12:41 PM, Michael Di Domenico <
[email protected]> wrote:
i'm not sure that's the issue, i basically tarred up the hadoop
directory from the cluster and copied it over to the non-data node
but i do agree i've likely got a setting wrong, since i can run distcp
from the namenode and it works fine. the question is which one
On Mon, Sep 8, 2008 at 7:04 PM, Aaron Kimball wrote:It is likely that you mapred.system.dir and/or fs.default.namesettings are
incorrect on the non-datanode machine that you are launching the task
from.
These two settings (in your conf/hadoop-site.xml file) must match the
settings on the cluster itself.
- Aaron
On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico
wrote:
I'm attempting to load data into hadoop (version 0.17.1), from a
non-datanode machine in the cluster. I can run jobs and
copyFromLocal
works
fine, but when i try to use distcp i get the below. I'm don't
understand
what the error, can anyone help?
Thanks
blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite
file:///Users/mdidomenico/hadoop/1gTestfile
/user/mdidomenico/1gTestfile
08/09/07 23:56:06 INFO util.CopyFiles:
srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
08/09/07 23:56:06 INFO util.CopyFiles:
destPath=/user/mdidomenico/1gTestfile1
08/09/07 23:56:07 INFO util.CopyFiles: srcCount=1
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.ipc.RemoteException:
java.io.IOException:
/tmp/hadoop-hadoop/mapred/system/job_200809072254_0005/job.xml: No such
file
or directory
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:175)
at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
at org.apache.hadoop.ipc.Client.call(Client.java:557)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at $Proxy1.submitJob(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.submitJob(Unknown Source)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)