FAQ
Hi all,

Why hadoop jobs need setup and cleanup phases which would consume a
lot of time ? Why could not us archieve it like a distributed RDBMS
does a master process coordinates all salve nodes through socket.
I think that will save plenty of time if there won't be any setups and
cleanups. What's hadoop philosophy on this?

Thanks,
Min
--
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Search Discussions

  • Jeff Zhang at Mar 11, 2010 at 3:28 am
    Hi Zhou,

    I look at the source code, it seems it is the JobTracker initiate the setup
    and cleanup task.
    And why do you think the setup and cleanup phases consume a lot of time,
    actually the time cost is depend on the OutputCommitter



    On Thu, Mar 11, 2010 at 11:04 AM, Min Zhou wrote:

    Hi all,

    Why hadoop jobs need setup and cleanup phases which would consume a
    lot of time ? Why could not us archieve it like a distributed RDBMS
    does a master process coordinates all salve nodes through socket.
    I think that will save plenty of time if there won't be any setups and
    cleanups. What's hadoop philosophy on this?

    Thanks,
    Min
    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com


    --
    Best Regards

    Jeff Zhang
  • Guo Leitao at Mar 13, 2010 at 8:22 am
    From our test of hadoop-0.20.1 on 10 nodes, we find the setup period is
    longer as more jobs are submitted. I don't know why maptask for setup is
    needed, why not jobtracker or one thread takes over this work?

    2010/3/11 Jeff Zhang <zjffdu@gmail.com>
    Hi Zhou,

    I look at the source code, it seems it is the JobTracker initiate the
    setup
    and cleanup task.
    And why do you think the setup and cleanup phases consume a lot of time,
    actually the time cost is depend on the OutputCommitter



    On Thu, Mar 11, 2010 at 11:04 AM, Min Zhou wrote:

    Hi all,

    Why hadoop jobs need setup and cleanup phases which would consume a
    lot of time ? Why could not us archieve it like a distributed RDBMS
    does a master process coordinates all salve nodes through socket.
    I think that will save plenty of time if there won't be any setups and
    cleanups. What's hadoop philosophy on this?

    Thanks,
    Min
    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com


    --
    Best Regards

    Jeff Zhang

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 11, '10 at 3:05a
activeMar 13, '10 at 8:22a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase