FAQ

On Thu, Jun 18, 2009 at 10:55 AM, Alex Loddengaard wrote:

I'm a little confused what you're question is. Are you asking why HDFS has
consistent read/write speeds even as your cluster gets more and more data?

If so, two HDFS bottlenecks that would change read/write performance as
used
capacity changes are name node (NN) RAM and the amount of data each of your
data nodes (DNs) are storing. If you have so much meta data (lots of
files,
blocks, etc.) that the NN java process uses most of your NN's memory, then
you'll see a big decrease in performance.
To avoid this issue, simply watch swap usage on your NN. If your NN starts
swapping you will likely run into problems with your metadata operation
speed. This won't affect throughput of read/writes within a block, though.

This bottleneck usually only
shows itself on large clusters with tons of metadata, though a small
cluster
with a wimpy NN machine will have the same bottleneck. Similarly, if each
of your DNs are storing close to their capacity, then reads/writes will
begin to slow down, as each node will be responsible for streaming more and
more data in and out. Does that make sense?

You should fill your cluster up to 80-90%. I imagine you'd probably see a
decrease in read/write performance depending on the tests you're running,
though I can't say I've done this performance test before. I'm merely
speculating.
Another thing to keep in mind is that local filesystem performance begins to
suffer once a disk is more than 80% or so full. This is due to the ways that
filesystems endeavour to keep file fragmentation low. When there is little
extra space on the drive, the file system has fewer options for relocating
blocks and fighting fragmentation, so "sequential" writes and reads will
actually incur seeks on the local disk. Since the datanodes store their
blocks on the local file system, this is a factor worth considering.

-Todd

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 18, '09 at 4:30p
activeJun 18, '09 at 6:54p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase