On Thu, 15 Oct 2009 11:32:35 +0200 Usman Waheed wrote:
Hi Todd,
Some changes have been applied to the cluster based on the
documentation (URL) you noted below,
I would also like to know what settings people are tuning on the
operating system level. The blog post mentioned here does not mention
much about that, except for the fileno changes.
We got about 3x the read performance when running DFSIOTest by mounting
our ext3 filesystems with the noatime parameter. I saw that mentioned
in the slides from some Cloudera presentation.
(For those who don't know, the noatime parameter turns off the
recording of access time on files. That's a horrible performance killer
since it means every read of a file also means that the kernel must do
a write. These writes are probably queued up, but still, if you don't
need the atime (very few applications do), turn it off!)
Have people been experimenting with different filesystems, or are most
of us running on top of ext3?
How about mounting ext3 with "data=writeback"? That's rumoured to give
the best throughput and could help with write performance. From
mount(8):
writeback
Data ordering is not preserved - data may be written into the main file system
after its metadata has been committed to the journal. This is rumoured to be the
highest throughput option. It guarantees internal file system integrity,
however it can allow old data to appear in files after a crash and journal recovery.
How would the HDFS consistency checks cope with old data appearing in
the unerlying files after a system crash?
Cheers,
\EF