FAQ
hi guys, when a file being copied to HDFS, it seems that HDFS always writes the first copy of a block to the data node running on the machine that invoked the copy, and the data nodes for the replicas are selected evenly from the remaining data nodes. so, for example, on a 5 node cluster with replication factor set to 2, if i copy a N-byte file from node 1, then node 1 will use up N bytes and nodes 2,3,4,5 will use up N/4 bytes each.
is this a known issue, or there any way to configure HDFS so that the blocks are distributed evenly (so with each node using up 2*N/5 bytes in this case)?
thanks,




---------------------------------
Get the free Yahoo! toolbar and rest assured with the added security of spyware protection.

Search Discussions

  • Hairong Kuang at May 23, 2007 at 12:58 am
    This is done on purpose to improve the write performance. In practice, we
    run map/reduce jobs on the cluster so every node in the cluster gets an
    equal chance of writing. A single node data uploading as described in your
    email is normally carried out at an off-cluster node. So imbalanced data
    distribution should not be a problem.

    Hairong

    -----Original Message-----
    From: [email protected]
    Sent: Tuesday, May 22, 2007 4:18 PM
    To: [email protected]
    Subject: question on HDFS block distribution


    hi guys, when a file being copied to HDFS, it seems that HDFS always
    writes the first copy of a block to the data node running on the machine
    that invoked the copy, and the data nodes for the replicas are selected
    evenly from the remaining data nodes. so, for example, on a 5 node cluster
    with replication factor set to 2, if i copy a N-byte file from node 1, then
    node 1 will use up N bytes and nodes 2,3,4,5 will use up N/4 bytes each.
    is this a known issue, or there any way to configure HDFS so that the
    blocks are distributed evenly (so with each node using up 2*N/5 bytes in
    this case)?
    thanks,




    ---------------------------------
    Get the free Yahoo! toolbar and rest assured with the added security of
    spyware protection.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 22, '07 at 11:18p
activeMay 23, '07 at 12:58a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Hairong Kuang: 1 post Moonwatcher32329: 1 post

People

Translate

site design / logo © 2023 Grokbase