|| at May 27, 2011 at 5:07 am
Another question related to it is that when you do hadoop fs -copyFromLocal
API to call fs.write(), does it write to local filesystem first before
writing to HDFS. I read and found out that it writes on local file-system
until block-size is reached and then writes on HDFS.
Wouldn't HDFS Client choke if it writes to local filesystem if multiple such
fs -copyFromLocal commands are running. I thought atleast in fs.write(), if
you provide byte array, it should not write on local file-system ?
Could somebody tell how fs -copyFromLocal and fs.write() work ? Do they
write on local-filesystem beofre block size is reached and then write to
HDFS or write directly to HDFS ?
Thanks in advance,
On Wed, May 18, 2011 at 9:39 AM, Patrick Angeles wrote:
kinda clunky but you could do this via shell:
for $FILE in $LIST_OF_FILES ; do
hadoop fs -copyFromLocal $FILE $DEST_PATH &
If doing this via the Java API, then, yes you will have to use multiple
On Wed, May 18, 2011 at 1:04 AM, Mapred Learn <email@example.com
Thanks harsh !
That means basically both APIs as well as hadoop client commands allow only
I was wondering what could be other ways to write data in parallel to HDFS
other than using multiple parallel threads.
Sent from my iPhone
On May 17, 2011, at 10:59 PM, Harsh J wrote:
Adding to Joey's response, copyFromLocal's current implementation is serial
given a list of files.
On Wed, May 18, 2011 at 9:57 AM, Mapred Learn <firstname.lastname@example.org>
Thanks Joey !
I will try to find out abt copyFromLocal. Looks like Hadoop Apis write
serially as you pointed out.
On May 17, 2011, at 8:32 PM, Joey Echeverria wrote:
The sequence file writer definitely does it serially as you can only
ever write to the end of a file in Hadoop.
Doing copyFromLocal could write multiple files in parallel (I'm not
sure if it does or not), but a single file would be written serially.
On Tue, May 17, 2011 at 5:44 PM, Mapred Learn <
My question is when I run a command from hdfs client, for eg. hadoop
-copyFromLocal or create a sequence file writer in java code and
key/values to it through Hadoop APIs, does it internally
to HDFS serially or in parallel ?
Thanks in advance,