FAQ
Hi all,
I am using Hadoop's Pipes API in a C++ code. I need to make successive
runTask() calls, i.e., I need to do chaining of Map -> Reduce -> Map
-> Reduce.
In between two successive invocations I need to set new values for
some of the jobconf's parameters, like mapred.input.dir,
mapred.output.dir. Can any one share some ideas, first hand experience
in how to do this using Pipes interface ?

Any pointers, advice is highly appreciated.

Thanks,
Prakhar

Search Discussions

  • Mike Kendall at Dec 11, 2009 at 6:44 pm
    make a runner that has a bunch of hadoop jobs in one bash file...
    that'll work for a little while.

    if you find yourself doing multistep jobs all the time, you're
    probably going to want to write a library or framework for doing this
    kind of stuff. even saving yourself the -file map.py -map map.py with
    -m map.py will save you a lot of headaches...

    On Fri, Dec 11, 2009 at 10:25 AM, Prakhar Sharma
    wrote:
    Hi all,
    I am using Hadoop's Pipes API in a C++ code. I need to make successive
    runTask() calls, i.e., I need to do chaining of Map -> Reduce -> Map
    -> Reduce.
    In between two successive invocations I need to set new values for
    some of the jobconf's parameters, like mapred.input.dir,
    mapred.output.dir. Can any one share some ideas, first hand experience
    in how to do this using Pipes interface ?

    Any pointers, advice is highly appreciated.

    Thanks,
    Prakhar
  • Amareshwari Sri Ramadasu at Dec 14, 2009 at 5:22 am
    You can use the utility JobControl for doing so.
    More info @
    http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html

    Thanks
    Amareshwri

    On 12/12/09 12:14 AM, "Mike Kendall" wrote:

    make a runner that has a bunch of hadoop jobs in one bash file...
    that'll work for a little while.

    if you find yourself doing multistep jobs all the time, you're
    probably going to want to write a library or framework for doing this
    kind of stuff. even saving yourself the -file map.py -map map.py with
    -m map.py will save you a lot of headaches...

    On Fri, Dec 11, 2009 at 10:25 AM, Prakhar Sharma
    wrote:
    Hi all,
    I am using Hadoop's Pipes API in a C++ code. I need to make successive
    runTask() calls, i.e., I need to do chaining of Map -> Reduce -> Map
    -> Reduce.
    In between two successive invocations I need to set new values for
    some of the jobconf's parameters, like mapred.input.dir,
    mapred.output.dir. Can any one share some ideas, first hand experience
    in how to do this using Pipes interface ?

    Any pointers, advice is highly appreciated.

    Thanks,
    Prakhar

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 11, '09 at 6:26p
activeDec 14, '09 at 5:22a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase