This is done on purpose to improve the write performance. In practice, we
run map/reduce jobs on the cluster so every node in the cluster gets an
equal chance of writing. A single node data uploading as described in your
email is normally carried out at an off-cluster node. So imbalanced data
distribution should not be a problem.


-----Original Message-----
From: moonwatcher32329@yahoo.com
Sent: Tuesday, May 22, 2007 4:18 PM
To: hadoop-user@lucene.apache.org
Subject: question on HDFS block distribution

hi guys, when a file being copied to HDFS, it seems that HDFS always
writes the first copy of a block to the data node running on the machine
that invoked the copy, and the data nodes for the replicas are selected
evenly from the remaining data nodes. so, for example, on a 5 node cluster
with replication factor set to 2, if i copy a N-byte file from node 1, then
node 1 will use up N bytes and nodes 2,3,4,5 will use up N/4 bytes each.
is this a known issue, or there any way to configure HDFS so that the
blocks are distributed evenly (so with each node using up 2*N/5 bytes in
this case)?

Get the free Yahoo! toolbar and rest assured with the added security of
spyware protection.

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupcommon-user @
postedMay 22, '07 at 11:18p
activeMay 23, '07 at 12:58a

2 users in discussion

Hairong Kuang: 1 post Moonwatcher32329: 1 post



site design / logo © 2022 Grokbase