FAQ
Hi,
My question is when I run a command from hdfs client, for eg. hadoop fs
-copyFromLocal or create a sequence file writer in java code and append
key/values to it through Hadoop APIs, does it internally transfer/write data
to HDFS serially or in parallel ?

Thanks in advance,
-JJ

Search Discussions

  • Mapred Learn at May 18, 2011 at 4:27 am
    Thanks Joey !
    I will try to find out abt copyFromLocal. Looks like Hadoop Apis write serially as you pointed out.

    Thanks,
    -JJ
    On May 17, 2011, at 8:32 PM, Joey Echeverria wrote:

    The sequence file writer definitely does it serially as you can only
    ever write to the end of a file in Hadoop.

    Doing copyFromLocal could write multiple files in parallel (I'm not
    sure if it does or not), but a single file would be written serially.

    -Joey
    On Tue, May 17, 2011 at 5:44 PM, Mapred Learn wrote:
    Hi,
    My question is when I run a command from hdfs client, for eg. hadoop fs
    -copyFromLocal or create a sequence file writer in java code and append
    key/values to it through Hadoop APIs, does it internally transfer/write data
    to HDFS serially or in parallel ?

    Thanks in advance,
    -JJ


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Mapred Learn at May 27, 2011 at 5:07 am
    Hi guys,
    Another question related to it is that when you do hadoop fs -copyFromLocal
    or use
    API to call fs.write(), does it write to local filesystem first before
    writing to HDFS. I read and found out that it writes on local file-system
    until block-size is reached and then writes on HDFS.
    Wouldn't HDFS Client choke if it writes to local filesystem if multiple such
    fs -copyFromLocal commands are running. I thought atleast in fs.write(), if
    you provide byte array, it should not write on local file-system ?

    Could somebody tell how fs -copyFromLocal and fs.write() work ? Do they
    write on local-filesystem beofre block size is reached and then write to
    HDFS or write directly to HDFS ?

    Thanks in advance,
    -JJ
    On Wed, May 18, 2011 at 9:39 AM, Patrick Angeles wrote:

    kinda clunky but you could do this via shell:

    for $FILE in $LIST_OF_FILES ; do
    hadoop fs -copyFromLocal $FILE $DEST_PATH &
    done

    If doing this via the Java API, then, yes you will have to use multiple
    threads.

    On Wed, May 18, 2011 at 1:04 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    Thanks harsh !
    That means basically both APIs as well as hadoop client commands allow only
    serial writes.
    I was wondering what could be other ways to write data in parallel to HDFS
    other than using multiple parallel threads.

    Thanks,
    JJ

    Sent from my iPhone
    On May 17, 2011, at 10:59 PM, Harsh J wrote:

    Hello,

    Adding to Joey's response, copyFromLocal's current implementation is serial
    given a list of files.

    On Wed, May 18, 2011 at 9:57 AM, Mapred Learn <mapred.learn@gmail.com>
    wrote:
    Thanks Joey !
    I will try to find out abt copyFromLocal. Looks like Hadoop Apis write
    serially as you pointed out.
    Thanks,
    -JJ
    On May 17, 2011, at 8:32 PM, Joey Echeverria wrote:

    The sequence file writer definitely does it serially as you can only
    ever write to the end of a file in Hadoop.

    Doing copyFromLocal could write multiple files in parallel (I'm not
    sure if it does or not), but a single file would be written serially.

    -Joey

    On Tue, May 17, 2011 at 5:44 PM, Mapred Learn <
    mapred.learn@gmail.com>
    wrote:
    Hi,
    My question is when I run a command from hdfs client, for eg. hadoop
    fs
    -copyFromLocal or create a sequence file writer in java code and
    append
    key/values to it through Hadoop APIs, does it internally
    transfer/write
    data
    to HDFS serially or in parallel ?

    Thanks in advance,
    -JJ


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedMay 18, '11 at 12:44a
activeMay 27, '11 at 5:07a
posts3
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Mapred Learn: 3 posts

People

Translate

site design / logo © 2022 Grokbase