FAQ
I'm looking to get acquainted with the new API in 0.20.2 but all the online
documentation I've found uses the old API.

I need to understand how to chain two mapreduce jobs together efficiently
that must run sequentially. I'd like to use the SequenceFileOutputFormat -->
SequenceFileInputFormat configuration between my two MapReduce jobs.

I would be so grateful for any help or links to relevant
documentation/examples.

Thanks,
John

Search Discussions

  • Amareshwari Sri Ramadasu at Mar 31, 2011 at 7:12 am
    John,
    Examples and libraries are rewritten to use new api in branch 0.21. You can have a look at them.
    New api in branch 0.20 is not stable yet. And old api is undeprecated in branch 0.21. So, you can use old api still.

    Thanks
    Amareshwari

    On 3/30/11 11:38 PM, "John Therrell" wrote:

    I'm looking to get acquainted with the new API in 0.20.2 but all the online documentation I've found uses the old API.

    I need to understand how to chain two mapreduce jobs together efficiently that must run sequentially. I'd like to use the SequenceFileOutputFormat --> SequenceFileInputFormat configuration between my two MapReduce jobs.

    I would be so grateful for any help or links to relevant documentation/examples.

    Thanks,
    John
  • Aaron Kimball at Apr 6, 2011 at 8:28 pm
    Simplest answer:

    Job A uses o.a.h.mapreduce.lib.output.SequenceFileOutputFormat
    It writes values to that (using context.write()) of classes KT, VT
    Use o.a.h.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new
    Path("job-a-out")); to configure the job to write to some location.

    Then run job.waitForCompletion(true);
    If the job succeeds (the above returns true), then run Job B:

    Job jobB = new Job();
    Job B uses FileInputFormat.addInputPath(jobB, new Path("job-a-out"); // Job
    A's out is Job B's in.
    jobB.setInputFormatClass(SequenceFileInputFormat.class)

    Job B's mapper will then receive (K, V) arguments with classes KT and VT

    Hope this helps...
    - Aaron
    On Thu, Mar 31, 2011 at 12:11 AM, Amareshwari Sri Ramadasu wrote:

    John,
    Examples and libraries are rewritten to use new api in branch 0.21. You can
    have a look at them.
    New api in branch 0.20 is not stable yet. And old api is undeprecated in
    branch 0.21. So, you can use old api still.

    Thanks
    Amareshwari


    On 3/30/11 11:38 PM, "John Therrell" wrote:

    I'm looking to get acquainted with the new API in 0.20.2 but all the online
    documentation I've found uses the old API.

    I need to understand how to chain two mapreduce jobs together efficiently
    that must run sequentially. I'd like to use the SequenceFileOutputFormat -->
    SequenceFileInputFormat configuration between my two MapReduce jobs.

    I would be so grateful for any help or links to relevant
    documentation/examples.

    Thanks,
    John

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedMar 30, '11 at 6:09p
activeApr 6, '11 at 8:28p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase