|| at Jul 19, 2011 at 8:31 pm
There has been some work going on recently around optimizing checksums. See
HDFS-2080 for example. This will help both the write and read code, though
we've focused more on read.
There have also been a lot of improvements around random read access - for
example HDFS-941 which improves random read by more than 2x.
I'm planning on writing a blog post in the next couple of weeks about some
of this work.
On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi wrote:
This blog post on YDN websitehttp://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
and opportunities for optimizations. Could someone please comment on
state of these potential optimizations? Are some of these expected to be
addressed in "next gen MR" release?
Software Engineer, Cloudera