I'm a little confused what you're question is. Are you asking why HDFS has
consistent read/write speeds even as your cluster gets more and more data?

If so, two HDFS bottlenecks that would change read/write performance as used
capacity changes are name node (NN) RAM and the amount of data each of your
data nodes (DNs) are storing. If you have so much meta data (lots of files,
blocks, etc.) that the NN java process uses most of your NN's memory, then
you'll see a big decrease in performance. This bottleneck usually only
shows itself on large clusters with tons of metadata, though a small cluster
with a wimpy NN machine will have the same bottleneck. Similarly, if each
of your DNs are storing close to their capacity, then reads/writes will
begin to slow down, as each node will be responsible for streaming more and
more data in and out. Does that make sense?

You should fill your cluster up to 80-90%. I imagine you'd probably see a
decrease in read/write performance depending on the tests you're running,
though I can't say I've done this performance test before. I'm merely

Hope this clears things up.

On Thu, Jun 18, 2009 at 9:30 AM, Wasim Bari wrote:

I am storing data on a HDFS cluster(4 machines). I have seen that
read/write is not very much effected with the size of data on HDFS (Total
data size of HDFS). I have used

20-30% of cluster and didn't completely filled it. Can someone explain me
why its so and HDFS promises such feature or I am missing some stuff?



Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupcommon-user @
postedJun 18, '09 at 4:30p
activeJun 18, '09 at 6:54p



site design / logo © 2022 Grokbase