hi guys, when a file being copied to HDFS, it seems that HDFS always writes the first copy of a block to the data node running on the machine that invoked the copy, and the data nodes for the replicas are selected evenly from the remaining data nodes. so, for example, on a 5 node cluster with replication factor set to 2, if i copy a N-byte file from node 1, then node 1 will use up N bytes and nodes 2,3,4,5 will use up N/4 bytes each.
is this a known issue, or there any way to configure HDFS so that the blocks are distributed evenly (so with each node using up 2*N/5 bytes in this case)?

Get the free Yahoo! toolbar and rest assured with the added security of spyware protection.

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupcommon-user @
postedMay 22, '07 at 11:18p
activeMay 23, '07 at 12:58a

2 users in discussion

Hairong Kuang: 1 post Moonwatcher32329: 1 post



site design / logo © 2022 Grokbase