FAQ
anybody tried to profile why HDFS write path is so much CPU intensive?

Search Discussions

  • Todd Lipcon at Nov 26, 2012 at 1:36 am
    Hi Radim,

    Currently it's CPU-intensive for several reasons:
    1) It doesn't yet use the native CRC code
    2) It makes several unnecessary copies and byte buffer allocations, both in
    the client and in the DataNode

    There are open JIRAs for these, and I have a preliminary patch which helped
    a lot, but it hasn't been high priority. On most clusters, writing becomes
    network bound before being CPU-bound. On the other hand, as 10gbe is
    becoming fairly common, this will probably be more important soon. Hoping
    to find time to get back to finishing the patches in the next few months.

    -Todd
    On Sun, Nov 25, 2012 at 1:41 PM, Radim Kolar wrote:

    anybody tried to profile why HDFS write path is so much CPU intensive?


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Radim Kolar at Nov 26, 2012 at 3:08 am

    Currently it's CPU-intensive for several reasons:
    1) It doesn't yet use the native CRC code
    2) It makes several unnecessary copies and byte buffer allocations, both in
    the client and in the DataNode

    There are open JIRAs for these, and I have a preliminary patch which helped
    a lot, but it hasn't been high priority.
    can you attach crc path there?
    https://issues.apache.org/jira/browse/HDFS-3528
    i will finish it.
  • Radim Kolar at Nov 29, 2012 at 3:17 pm
    Hoping to find time to get back to finishing the patches in the next
    few months.
    Todd,
    just attach these pathes to jira, they do not even needs to apply
    cleanly to trunk. I will get them finished within day. I do not have
    months which i can spare on waiting for work be done by you. If you do
    not want to share these patches, its still fine with me we can do this
    work alone as well. I need just word from you.
  • Todd Lipcon at Nov 29, 2012 at 6:26 pm
    Hi Radim,

    My work-in-progress branch is online here:
    https://github.com/toddlipcon/hadoop-common/commits/trunk-write-pipeline-fast

    It is definitely buggy, it might not actually be faster, and it
    probably isn't well commented. But feel free to have a go at it.

    -Todd
    On Thu, Nov 29, 2012 at 7:17 AM, Radim Kolar wrote:

    Hoping to find time to get back to finishing the patches in the next few
    months.
    Todd,
    just attach these pathes to jira, they do not even needs to apply cleanly
    to trunk. I will get them finished within day. I do not have months which i
    can spare on waiting for work be done by you. If you do not want to share
    these patches, its still fine with me we can do this work alone as well. I
    need just word from you.


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Radim Kolar at Dec 4, 2012 at 5:07 pm

    It is definitely buggy, it might not actually be faster, and it
    probably isn't well commented. But feel free to have a go at it.
    thank you for your code, i got it merged with trunk. HDFS is crap code,
    private methods not documented at all, and unit tests are joke. I did
    some random code changes and some were not detected by unit tests. What
    methods are you using for testing?
  • Todd Lipcon at Dec 4, 2012 at 5:28 pm

    On Tue, Dec 4, 2012 at 9:07 AM, Radim Kolar wrote:
    It is definitely buggy, it might not actually be faster, and it
    probably isn't well commented. But feel free to have a go at it.
    thank you for your code, i got it merged with trunk. HDFS is crap code,
    private methods not documented at all, and unit tests are joke. I did some
    random code changes and some were not detected by unit tests. What methods
    are you using for testing?
    If you're just going to insult us, please stay away. We don't need
    your help unless you're going to be constructive.

    Todd
    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Radim Kolar at Dec 4, 2012 at 5:40 pm

    If you're just going to insult us, please stay away. We don't need
    your help unless you're going to be constructive.
    Good units tests will catch code modifications like:

    from:
    long getLastByteOffsetBlock() {
    return lastByteOffsetInBlock;
    }

    to

    from:
    long getLastByteOffsetBlock() {
    return lastByteOffsetInBlock-1;
    }

    I did 10 of such changes and about 60% were undetected.
  • Eli Collins at Dec 4, 2012 at 5:44 pm

    On Tue, Dec 4, 2012 at 9:39 AM, Radim Kolar wrote:
    If you're just going to insult us, please stay away. We don't need
    your help unless you're going to be constructive.
    Good units tests will catch code modifications like:
    Agree. Want to write some? Would love to see patches like this.
    Here's a recent example: HDFS-4156 (Seeking to a negative position
    should throw an IOE)

    Thanks,
    Eli
  • Radim Kolar at Dec 5, 2012 at 2:00 am
    Agree. Want to write some?
    Its not about writing patches, its about to get them committed. I have
    experience that getting something committed takes months even on simple
    patch. I have about 10 patches floating around none of them was
    committed in last 4 weeks. They are really simple stuff. I haven't tried
    to go with some more elaborated patch because Bible says: if you fail
    easy thing, you will fail hard thing too.

    I am thinking day by day that i really need to fork hadoop otherwise
    there is no way to move it forward where i need it to be.
  • Steve Loughran at Dec 5, 2012 at 8:58 am

    On 5 December 2012 02:00, Radim Kolar wrote:
    Agree. Want to write some?
    Its not about writing patches, its about to get them committed. I have
    experience that getting something committed takes months even on simple
    patch. I have about 10 patches floating around none of them was committed
    in last 4 weeks. They are really simple stuff. I haven't tried to go with
    some more elaborated patch because Bible says: if you fail easy thing, you
    will fail hard thing too.
    There is inertia; nobody is happy with it -but that's the price of having
    something that's designed to keep PB of data safe.


    I am thinking day by day that i really need to fork hadoop otherwise there
    is no way to move it forward where i need it to be.
    A lot of the early hadoop projects chose this path. Once you get out of
    sync with the apache code you have two problems
    -keeping your branch up to date with all fixes and features you want.
    -testing
  • Andy Isaacson at Dec 5, 2012 at 10:22 pm

    On Tue, Dec 4, 2012 at 6:00 PM, Radim Kolar wrote:
    Its not about writing patches, its about to get them committed. I have
    experience that getting something committed takes months even on simple
    patch. I have about 10 patches floating around none of them was committed in
    last 4 weeks.
    Could you share a list of Jiras you're concerned about? I've seen a
    few patches you provided that got committed just fine, and I've seen a
    few patches that I thought didn't have a strong justification that
    didn't get committed, and I think I've seen a few Jiras that I thought
    were a good idea that haven't been committed yet due to outstanding
    review feedback or lack of a committer who can volunteer to do the
    work.

    I'm not saying that the Hadoop process is perfect, far from it, but
    from where I sit (like you I'm a contributor but not yet a committer)
    it seems to be working OK so far for both you and I. Some things could
    be better, but the current fairly-conservative process has the benefit
    of keeping trunk in a really sane, safe state.
    They are really simple stuff. I haven't tried to go with some
    more elaborated patch because Bible says: if you fail easy thing, you will
    fail hard thing too.

    I am thinking day by day that i really need to fork hadoop otherwise there
    is no way to move it forward where i need it to be.
    Forking is tempting, but working with the community is really
    powerful. You've got plenty of successful jiras under your belt, let's
    just keep on truckin' and build a better Hadoop.

    -andy
  • Radim Kolar at Dec 6, 2012 at 2:03 am
  • Andy Isaacson at Dec 6, 2012 at 11:07 pm

    I don't really know the YARN or MAPREDUCE code bases so I'm going to
    pass on those ones...
    Todd asked a pretty reasonable question that I don't see an answer to
    -- where will murmur3 actually be used? We generally don't add code,
    even if it's good code that we're sure to need someday, until there's
    an actual user for it.
    There needs to be a complete, up-to-date patch uploaded. This one
    seems to have two patches that need to be applied to get a working
    commit -- HADOOP-9041.patch and fsinit-unit.txt. Also the latter has a
    misspelled classname, Initialization is spelled with a "t" rather than
    a "c".

    It would be really good to develop a JUnit test that fails reliably
    both under mvn and Eclipse that shows the problem to avoid regressions
    in the future... even if the unit test has to do moderately unclean
    things to force the failure. (But that's not a hard requirement, if
    it's really impossible to do the current situation is OK.)
    I don't understand this patch at all. Since it makes the constructor
    vacuous, why not just delete the constructor entirely? If avoiding the
    possible "could be null" makes other code simpler, go ahead and
    include the simplification in this patch. (see below for more on
    including stuff in a single jira.)

    Generally if Jenkins posts a -1 on a patch, you should follow up with
    a comment explaining why it's OK for this patch to fail the given
    test. For example I had a change recently that fixed an intermittent
    test failure, so I didn't need to add a test. Jenkins said "-1 no
    tests included" and I commented "fixes TestFoo intermittent failures".

    One of the ways the community has compensated for the heavyweight JIRA
    process is to allow a single JIRA to include more change than I would
    normally put into a git commit. I do my development locally in a
    per-jira branch "hdfs1337" with normal small git-style commits, and
    then when I'm ready to post a patch I "git diff
    upstream/trunk..hdfs1337 > hdfs1337.txt" to squash all the sane git
    commits into a single large diff to upload.

    Thanks,
    -andy
  • Radim Kolar at Dec 8, 2012 at 4:40 am

    I'm not saying that the Hadoop process is perfect, far from it, but
    from where I sit (like you I'm a contributor but not yet a committer)
    it seems to be working OK so far for both you and I.
    It does not work for me OK. Its way too slow. i got just 2k LOC in
    committed and still floating around patches. That is real and sad result
    of 1/2 year of cooperation. I know that contributor patches are low
    priority in every project, but this is too low priority for me.
    Some things could be better, but the current fairly-conservative process has the benefit
    of keeping trunk in a really sane, safe state.
    if you want to keep code in safe state you need:
    1. good unit test
    2. high unit test coverage
    3. clean code
    4. documented code
    5. good javadoc
    You've got plenty of successful jiras under your belt, let's just keep on truckin' and build a better Hadoop.
    only successful work was rework of todd patch because it made hbase
    about 30% faster.
  • Steve Loughran at Dec 8, 2012 at 12:39 pm

    On 8 December 2012 04:39, Radim Kolar wrote:
    if you want to keep code in safe state you need:
    1. good unit test
    2. high unit test coverage
    3. clean code
    4. documented code
    5. good javadoc

    + good functional tests, which explores the deployment state of the world,
    especially different networks. Once you get into HA you also need the
    ability to trigger server failures and network partitions as part of a test
    run.
  • Suresh Srinivas at Dec 4, 2012 at 5:49 pm
    Thank you Todd! I have been seeing similar attitude in many jiras, that I
    have tried hard to ignore and was wondering how to respond to this email.

    I could not have said it better.
    On Tue, Dec 4, 2012 at 9:27 AM, Todd Lipcon wrote:

    f you're just going to insult us, please stay away. We don't need
    your help unless you're going to be constructive.


    --
    http://hortonworks.com/download/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
categorieshadoop
postedNov 25, '12 at 9:42p
activeDec 8, '12 at 12:39p
posts17
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase