FAQ
I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS cluster and replication factor is 1. A large file is there on one of those three cluster machines in its local file system. If I put that file in HDFS will it be divided and distributed across all three machines? I had this doubt as HDFS "moving computation is cheaper than moving data".

If file is distributed across all three machines, lots of data transfer will be there, whereas, if file is NOT distributed then compute power of other machine will be unused. Am I missing something here?

-Raj

Search Discussions

  • Harish Mallipeddi at Jun 18, 2009 at 10:29 am

    On Thu, Jun 18, 2009 at 3:43 PM, rajeev gupta wrote:
    I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS
    cluster and replication factor is 1. A large file is there on one of those
    three cluster machines in its local file system. If I put that file in HDFS
    will it be divided and distributed across all three machines? I had this
    doubt as HDFS "moving computation is cheaper than moving data".

    If file is distributed across all three machines, lots of data transfer
    will be there, whereas, if file is NOT distributed then compute power of
    other machine will be unused. Am I missing something here?

    -Raj

    Irrespective of what you set as the replication factor, large files will
    always be split into chunks (chunk size is what you set as your HDFS
    block-size) and they'll be distributed across your entire cluster.


    --
    Harish Mallipeddi
    http://blog.poundbang.in
  • Roshan James at Jun 18, 2009 at 10:24 pm
    Further, look at the namenode file system browser for your cluster to see
    the chunking in action.

    http://wiki.apache.org/hadoop/WebApp%20URLs

    Roshan
    On Thu, Jun 18, 2009 at 6:28 AM, Harish Mallipeddi wrote:
    On Thu, Jun 18, 2009 at 3:43 PM, rajeev gupta wrote:


    I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS
    cluster and replication factor is 1. A large file is there on one of those
    three cluster machines in its local file system. If I put that file in HDFS
    will it be divided and distributed across all three machines? I had this
    doubt as HDFS "moving computation is cheaper than moving data".

    If file is distributed across all three machines, lots of data transfer
    will be there, whereas, if file is NOT distributed then compute power of
    other machine will be unused. Am I missing something here?

    -Raj

    Irrespective of what you set as the replication factor, large files will
    always be split into chunks (chunk size is what you set as your HDFS
    block-size) and they'll be distributed across your entire cluster.


    --
    Harish Mallipeddi
    http://blog.poundbang.in

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 18, '09 at 10:13a
activeJun 18, '09 at 10:24p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase