As an extension to the problem statement...Is it possible to fuse step 1 and 2 in to one step?
i.e. Can we have the map task to pick the input from an external filesystem instead of HDFS.
Can FTPfileSystem/RawLocalFileSystem can be of any help here?

On 15-Nov-2010, at 3:10 PM, Sebastian Schoenherr wrote:

Hi Matthew,
of course, you can copy it directly to HDFS and vice versa. Use the IOUtils (hadoop.io.IOUtils) like this:
FileSystem fileSystem = FileSystem.get(conf); (org.apache.hadoop.fs.FileSystem)

"in" and "out" are the streams (out is in this example the HDFS outputstream)
IOUtils.copyBytes(in, out, fileSystem.getConf());

hope this helps,

Zitat von Matthew John <tmatthewjohn1988@gmail.com>:
Hi all ,

I have been working with MapReduce and HDFS for sometime. So the procedure
what I normally follow is :

1) copy in the input file from Local File System to HDFS

2) run the map reduce module

3) copy the output file back to the Local File System from the HDFS

But I feel , step 1 and 3 is adding a lot of overhead to the entire process

My queries are :

1) I am getting the files into the Local File System by establishing a port
connection with another node. So can I ensure that the data which is ported
into the hadoop node is directly written to the HDFS instead of going
through the Local File System and then performing a CopyFromLocal ???

2) Can I copy the reduce output (which creates the final output file)
directly to the Local File System instead of injecting it to the HDFS
(effectively into different nodes in HDFS), so that I can minimize the
overhead ?? I expect this procedure to take much lesser time than copying to
the HDFS and then performing a CopyToLocal.. Finally I should be able to
send this file back to another node using socket communication..

Looking forward to your suggestions !!


Matthew John

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupcommon-user @
postedNov 15, '10 at 5:37a
activeNov 17, '10 at 1:51p



site design / logo © 2022 Grokbase