FAQ
Hi all,

I have a set of map red jobs which need to be cascaded ,i.e, output of MR
job1 is the input of MR job2. etc..

Can anyone point me to the corresponding classes in hadoop 0.20.0 API?

I have seen "x.addDependingJob(y)" function in the yahoo's hadoop tutorial
but that is for the older versions..
What is the similar thing in 0.20.0 API?

Any help is appreciated ,

Thanks
bharath.v
ug3
IIIT Hyderabad!

Search Discussions

  • Tom White at Oct 2, 2009 at 11:09 am
    Have a look at the JobControl class - this allows you to set up chains
    of job dependencies.

    Tom

    On Fri, Oct 2, 2009 at 11:29 AM, bharath v
    wrote:
    Hi all,

    I have a set of map red jobs which need to be cascaded ,i.e, output of MR
    job1 is the input of MR job2. etc..

    Can anyone point me to the corresponding classes in hadoop 0.20.0 API?

    I have seen "x.addDependingJob(y)" function in the yahoo's hadoop tutorial
    but that is for the older versions..
    What is the similar thing in 0.20.0 API?

    Any help is appreciated ,

    Thanks
    bharath.v
    ug3
    IIIT Hyderabad!
  • Chris K Wensel at Oct 2, 2009 at 3:39 pm
    You might find the Cascading project quite useful in this regard.

    http://www.cascading.org/

    using MapReduceFlow and CascadeConnector classes, you can chain
    arbitrary MR jobs together. Cascading will determine the dependencies,
    if any, and run the jobs in topological order (independent jobs will
    be submitted to run in parallel).

    you may also find writing your own MR jobs by hand tedious and
    brittle. Cascading can help you there as well.

    cheers,
    chris
    On Oct 2, 2009, at 3:29 AM, bharath v wrote:

    Hi all,

    I have a set of map red jobs which need to be cascaded ,i.e, output
    of MR
    job1 is the input of MR job2. etc..

    Can anyone point me to the corresponding classes in hadoop 0.20.0 API?

    I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
    tutorial
    but that is for the older versions..
    What is the similar thing in 0.20.0 API?

    Any help is appreciated ,

    Thanks
    bharath.v
    ug3
    IIIT Hyderabad!
    --
    Chris K Wensel
    chris@concurrentinc.com
    http://www.concurrentinc.com
  • Bharath vissapragada at Oct 3, 2009 at 5:29 pm
    Tom and Chris ,

    Thanks for your replies .. I have seen thr o.a.h.mapred.jobcontrol.Job
    and o.a.h.mapreduce.Job .. Only one of them has the above option of
    adding a dependent Jobs .. Can anyone tell me the difference between
    "mapred" and "mapreduce" packages ..

    Thanks in advance
    On 10/2/09, Chris K Wensel wrote:
    You might find the Cascading project quite useful in this regard.

    http://www.cascading.org/

    using MapReduceFlow and CascadeConnector classes, you can chain
    arbitrary MR jobs together. Cascading will determine the dependencies,
    if any, and run the jobs in topological order (independent jobs will
    be submitted to run in parallel).

    you may also find writing your own MR jobs by hand tedious and
    brittle. Cascading can help you there as well.

    cheers,
    chris
    On Oct 2, 2009, at 3:29 AM, bharath v wrote:

    Hi all,

    I have a set of map red jobs which need to be cascaded ,i.e, output
    of MR
    job1 is the input of MR job2. etc..

    Can anyone point me to the corresponding classes in hadoop 0.20.0 API?

    I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
    tutorial
    but that is for the older versions..
    What is the similar thing in 0.20.0 API?

    Any help is appreciated ,

    Thanks
    bharath.v
    ug3
    IIIT Hyderabad!
    --
    Chris K Wensel
    chris@concurrentinc.com
    http://www.concurrentinc.com
  • Kevin Weil at Oct 17, 2009 at 8:16 pm
    Bharath,
    The mapred package is largely deprecated, as hadoop is moving towards the
    mapreduce package. Use mapreduce for any new jobs you write, because mapred
    will go away in some future release. For now, both are there to give
    developers time to rewrite existing older jobs.

    Kevin
    On Sat, Oct 3, 2009 at 10:29 AM, bharath vissapragada wrote:

    Tom and Chris ,

    Thanks for your replies .. I have seen thr o.a.h.mapred.jobcontrol.Job
    and o.a.h.mapreduce.Job .. Only one of them has the above option of
    adding a dependent Jobs .. Can anyone tell me the difference between
    "mapred" and "mapreduce" packages ..

    Thanks in advance
    On 10/2/09, Chris K Wensel wrote:
    You might find the Cascading project quite useful in this regard.

    http://www.cascading.org/

    using MapReduceFlow and CascadeConnector classes, you can chain
    arbitrary MR jobs together. Cascading will determine the dependencies,
    if any, and run the jobs in topological order (independent jobs will
    be submitted to run in parallel).

    you may also find writing your own MR jobs by hand tedious and
    brittle. Cascading can help you there as well.

    cheers,
    chris
    On Oct 2, 2009, at 3:29 AM, bharath v wrote:

    Hi all,

    I have a set of map red jobs which need to be cascaded ,i.e, output
    of MR
    job1 is the input of MR job2. etc..

    Can anyone point me to the corresponding classes in hadoop 0.20.0 API?

    I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
    tutorial
    but that is for the older versions..
    What is the similar thing in 0.20.0 API?

    Any help is appreciated ,

    Thanks
    bharath.v
    ug3
    IIIT Hyderabad!
    --
    Chris K Wensel
    chris@concurrentinc.com
    http://www.concurrentinc.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 2, '09 at 10:30a
activeOct 17, '09 at 8:16p
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase