FAQ
This blog post on YDN website
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
operations
and opportunities for optimizations. Could someone please comment on current
state of these potential optimizations? Are some of these expected to be
addressed in "next gen MR" release?

Thanks,
-Shrinivas

Search Discussions

  • Todd Lipcon at Jul 19, 2011 at 8:31 pm
    Hi Shrinivas,

    There has been some work going on recently around optimizing checksums. See
    HDFS-2080 for example. This will help both the write and read code, though
    we've focused more on read.

    There have also been a lot of improvements around random read access - for
    example HDFS-941 which improves random read by more than 2x.

    I'm planning on writing a blog post in the next couple of weeks about some
    of this work.

    -Todd
    On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi wrote:

    This blog post on YDN website

    http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
    detailed discussion on different steps involved in Hadoop IO
    operations
    and opportunities for optimizations. Could someone please comment on
    current
    state of these potential optimizations? Are some of these expected to be
    addressed in "next gen MR" release?

    Thanks,
    -Shrinivas


    --
    Todd Lipcon
    Software Engineer, Cloudera

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 19, '11 at 8:26p
activeJul 19, '11 at 8:31p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Shrinivas Joshi: 1 post Todd Lipcon: 1 post

People

Translate

site design / logo © 2022 Grokbase