On Thu, Jun 18, 2009 at 10:55 AM, Alex Loddengaard wrote:

I'm a little confused what you're question is. Are you asking why HDFS has
consistent read/write speeds even as your cluster gets more and more data?

If so, two HDFS bottlenecks that would change read/write performance as
capacity changes are name node (NN) RAM and the amount of data each of your
data nodes (DNs) are storing. If you have so much meta data (lots of
blocks, etc.) that the NN java process uses most of your NN's memory, then
you'll see a big decrease in performance.
To avoid this issue, simply watch swap usage on your NN. If your NN starts
swapping you will likely run into problems with your metadata operation
speed. This won't affect throughput of read/writes within a block, though.

This bottleneck usually only
shows itself on large clusters with tons of metadata, though a small
with a wimpy NN machine will have the same bottleneck. Similarly, if each
of your DNs are storing close to their capacity, then reads/writes will
begin to slow down, as each node will be responsible for streaming more and
more data in and out. Does that make sense?

You should fill your cluster up to 80-90%. I imagine you'd probably see a
decrease in read/write performance depending on the tests you're running,
though I can't say I've done this performance test before. I'm merely
Another thing to keep in mind is that local filesystem performance begins to
suffer once a disk is more than 80% or so full. This is due to the ways that
filesystems endeavour to keep file fragmentation low. When there is little
extra space on the drive, the file system has fewer options for relocating
blocks and fighting fragmentation, so "sequential" writes and reads will
actually incur seeks on the local disk. Since the datanodes store their
blocks on the local file system, this is a factor worth considering.


Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupcommon-user @
postedJun 18, '09 at 4:30p
activeJun 18, '09 at 6:54p



site design / logo © 2022 Grokbase