FAQ
Hi all,
We are integrating the hadoop jobs with the sun grid engine. Most of the map reduce jobs that start on our cluster are sequential map and reduce. I also found integration guidelines here :http://blogs.sun.com/templedf/entry/beta_testing_the_sun_grid and http://blogs.sun.com/ravee/entry/creating_hadoop_pe_under_sge .

I wanted to know whether every sequential map-reduce job would be counted as a separate job to sun sge. That's necessary because in total the sequential map-reduce runs for days.

Thanks
H

Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Search Discussions

  • Gianluigi Zanetti at Dec 11, 2009 at 7:59 am
    Hello Himanshu.
    Could you please describe in more detail your use case?

    There are two basic Gridengine integration schemes:

    1/ Native integration in grid engine
    This is the one referred to in Dan Templeton's blog. It is based on the
    assumption that hdfs is always running, and it will essentially have
    your map-reduce job (including a per-job jobtracker) scheduled as a
    parallel environment 'as-close-as-possible' to your hdfs data.
    It will be in GE 6.2u5, which is currently in beta and should be out any
    moment now. It is possible to back-port to 6.2u4 and probably to 6.2u3.

    2/ HOD integration
    What hod does, in a nutshell, is to allocate a group of machines as a
    parallel environment within GE and to run a jobtracker and a namenode
    that will control the allocated machines. User will then submit their
    jobs to the jobtracker and use the hdfs controlled by the namenode.
    Of course, the resulting hadoop environment is transient since, as far
    as GE is concerned, it is simply a parallel job. Of course, the meaning
    of transient depends on how you set-up your queues.
    We have developed a patch to add Gridengine support to hadoop hod,
    http://issues.apache.org/jira/browse/HADOOP-6369
    This is pretty undemanding on GE version, but it is not very efficient
    hdfs wise, since gridengine is ignorant of hdfs data locality. In
    practice, either you ask hod to use an independent hdfs that is always
    up -- but there is no guarantee that the tasktracker nodes will be close
    to the data -- or you upload your data to a new hdfs that will be
    created by hod.

    Thus, 1/ is definitely more efficient and 'cluster-wide' while 2/ is
    more like a sort of cluster partitioning.



    --gianluigi






    On Wed, 2009-12-09 at 12:43 -0800, himanshu chandola wrote:
    Hi all,
    We are integrating the hadoop jobs with the sun grid engine. Most of
    the map reduce jobs that start on our cluster are sequential map and
    reduce. I also found integration guidelines
    here :http://blogs.sun.com/templedf/entry/beta_testing_the_sun_grid
    and http://blogs.sun.com/ravee/entry/creating_hadoop_pe_under_sge .

    I wanted to know whether every sequential map-reduce job would be counted as a separate job to sun sge. That's necessary because in total the sequential map-reduce runs for days.

    Thanks
    H

    Morpheus: Do you believe in fate, Neo?
    Neo: No.
    Morpheus: Why Not?
    Neo: Because I don't like the idea that I'm not in control of my life.


    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 9, '09 at 8:44p
activeDec 11, '09 at 7:59a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase