hi guys, when a file being copied to HDFS, it seems that HDFS always writes the first copy of a block to the data node running on the machine that invoked the copy, and the data nodes for the replicas are selected evenly from the remaining data nodes. so, for example, on a 5 node cluster with replication factor set to 2, if i copy a N-byte file from node 1, then node 1 will use up N bytes and nodes 2,3,4,5 will use up N/4 bytes each.
is this a known issue, or there any way to configure HDFS so that the blocks are distributed evenly (so with each node using up 2*N/5 bytes in this case)?
thanks,
---------------------------------
Get the free Yahoo! toolbar and rest assured with the added security of spyware protection.