FAQ
Hi,

I have a high rate of data coming in that I'm constantly writing to my
HDFS (say 2MB/s). I tested even higher rates (70 MB/s) and was
surprised that it was able to perform so well (via 8 threads in a
multi-core machine).

However, I just found this note in the HDFS docs: "In fact, initially
the HDFS client caches the file data into a temporary local file".

Does that mean the rates that I'm seeing are not the rates of which
files are copied into the HDFS, but rather the rate to which my hdfs
client is just copying to /tmp ?

And if so, if I had a much higher rate (e.g. 70 MB/s), wouldn't I see
potential issues in the HDFS client trying to keep up w/ the local
copy?

Thanks,
Will

Search Discussions

  • Ted Dunning at Jul 7, 2009 at 6:10 pm
    Is the client doing the writing part of the Hadoop system? If so, it is
    definitely writing locally, but also to the cluster as well.

    The simplest test is to just write from a machine that is not part of the
    cluster.d
    On Tue, Jul 7, 2009 at 8:49 AM, william kinney wrote:


    However, I just found this note in the HDFS docs: "In fact, initially
    the HDFS client caches the file data into a temporary local file".
  • Raghu Angadi at Jul 7, 2009 at 7:50 pm

    william kinney wrote:
    Hi,

    I have a high rate of data coming in that I'm constantly writing to my
    HDFS (say 2MB/s). I tested even higher rates (70 MB/s) and was
    surprised that it was able to perform so well (via 8 threads in a
    multi-core machine).

    However, I just found this note in the HDFS docs: "In fact, initially
    the HDFS client caches the file data into a temporary local file".
    This doc is out dated. It does not write to temporary local file.

    As Ted mentioned, if your client is also (I suspect not) a datanode,
    then currently HDFS writes one replica to the local datanode.

    Raghu.
    Does that mean the rates that I'm seeing are not the rates of which
    files are copied into the HDFS, but rather the rate to which my hdfs
    client is just copying to /tmp ?

    And if so, if I had a much higher rate (e.g. 70 MB/s), wouldn't I see
    potential issues in the HDFS client trying to keep up w/ the local
    copy?

    Thanks,
    Will

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 7, '09 at 4:05p
activeJul 7, '09 at 7:50p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase