Hi,

I have a map/reduce job that may discover on the way, during the map
phase,
that continuing is pointless. I am not sure to accomplish a job
cancellation from
within the Job. Is there an API for that?

A second-best solution would be a way to update the jobs configuration
so that
at least the reducer and all later started mappers/combiners would find
the
updated configuration that would indicate that continuing is pointless
(but
some cleanup may still be done by the reducer - for example).

Anyway, to cut things short:

a) Is there a (good) way to cancel on ongoing job from within the Job's
implementation?
b) Is there a (good) way to update a jobs configuration from within the
Job's implementation
so that the update will be seen by later tasks?

Thanks,
Henning

Search Discussions

  • Harsh J at Nov 2, 2010 at 2:19 pm
    Hello,
    On Tue, Nov 2, 2010 at 5:55 PM, Henning Blohm wrote:
    Hi,

    I have a map/reduce job that may discover on the way, during the map
    phase,
    that continuing is pointless. I am not sure to accomplish a job cancellation
    from
    within the Job. Is there an API for that?
    Well nothing in the Tasks can kill a job (or at least I don't know of
    a way). But a "Job" can be killed by its driver.

    This is the API you need to look at:
    http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/RunningJob.html
    A second-best solution would be a way to update the jobs configuration so
    that
    at least the reducer and all later started mappers/combiners would find the
    updated configuration that would indicate that continuing is pointless (but
    some cleanup may still be done by the reducer - for example).
    If say a map has failed (due to mapper or combiner), then it can be
    monitored via the RunningJob instance itself to decide an action. You
    can kill at that point.
    Anyway, to cut things short:

    a) Is there a (good) way to cancel on ongoing job from within the Job's
    implementation?
    RunningJob.killJob()
    b) Is there a (good) way to update a jobs configuration from within the
    Job's implementation
    so that the update will be seen by later tasks?
    Your reducers would run anyway if you set a configuration (somehow, as
    am not sure if that's possible). You'll lose more time compared to
    killing the job based on the TaskCompletionEvents given by RunningJob
    objects when queried (which you can keep doing, over the
    ClientProtocol).
    Thanks,
    Henning


    --
    Harsh J
    www.harshj.com
  • Henning Blohm at Nov 2, 2010 at 4:33 pm
    Harsh,

    thanks, I will give that RunningJob interface a try.

    Henning
    On Tue, 2010-11-02 at 19:47 +0530, Harsh J wrote:

    Hello,
    On Tue, Nov 2, 2010 at 5:55 PM, Henning Blohm wrote:
    Hi,

    I have a map/reduce job that may discover on the way, during the map
    phase,
    that continuing is pointless. I am not sure to accomplish a job cancellation
    from
    within the Job. Is there an API for that?
    Well nothing in the Tasks can kill a job (or at least I don't know of
    a way). But a "Job" can be killed by its driver.

    This is the API you need to look at:
    http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/RunningJob.html
    A second-best solution would be a way to update the jobs configuration so
    that
    at least the reducer and all later started mappers/combiners would find the
    updated configuration that would indicate that continuing is pointless (but
    some cleanup may still be done by the reducer - for example).
    If say a map has failed (due to mapper or combiner), then it can be
    monitored via the RunningJob instance itself to decide an action. You
    can kill at that point.
    Anyway, to cut things short:

    a) Is there a (good) way to cancel on ongoing job from within the Job's
    implementation?
    RunningJob.killJob()
    b) Is there a (good) way to update a jobs configuration from within the
    Job's implementation
    so that the update will be seen by later tasks?
    Your reducers would run anyway if you set a configuration (somehow, as
    am not sure if that's possible). You'll lose more time compared to
    killing the job based on the TaskCompletionEvents given by RunningJob
    objects when queried (which you can keep doing, over the
    ClientProtocol).
    Thanks,
    Henning

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedNov 2, '10 at 12:25p
activeNov 2, '10 at 4:33p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Henning Blohm: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase