They write directly to HDFS, there's no additional buffering on the
local file system of the client.

On Tue, May 31, 2011 at 7:56 PM, Mapred Learn wrote:
Hi guys,
I asked this question earlier but did not get any response. So, posting
again. Hope somebody can point to the right description:

When you do hadoop fs -copyFromLocal or use API to call fs.write() (when
Filesystem fs is HDFS), does it write to local filesystem first before
writing to HDFS ?

I read and found out that it writes on local file-system until block-size is
reached and then writes on HDFS.
Wouldn't HDFS Client choke if it writes to local filesystem if multiple such
fs -copyFromLocal commands are running. I thought atleast in fs.write(), if
you provide byte array, it should not write on local file-system ?

Some places I found out that hdfs client and datanode communicate through
rpc/sockets. Do they write on local file-systems also in this case or is it
just a buffer in memory that they write directly on HDFS.
Could somebody point me to some doc/code where I could find out how fs
-copyFromLocal and fs.write() work ? Do they write on local-filesystem
before block size is reached and then write to HDFS or write directly to

Thanks in advance,

Joseph Echeverria
Cloudera, Inc.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
postedJun 1, '11 at 12:06a
activeJun 1, '11 at 12:06a

1 user in discussion

Joey Echeverria: 1 post



site design / logo © 2022 Grokbase