|
Harsh J |
at Nov 2, 2010 at 2:19 pm
|
⇧ |
| |
Hello,
On Tue, Nov 2, 2010 at 5:55 PM, Henning Blohm wrote:
Hi,
I have a map/reduce job that may discover on the way, during the map
phase,
that continuing is pointless. I am not sure to accomplish a job cancellation
from
within the Job. Is there an API for that?
Well nothing in the Tasks can kill a job (or at least I don't know of
a way). But a "Job" can be killed by its driver.
This is the API you need to look at:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/RunningJob.htmlA second-best solution would be a way to update the jobs configuration so
that
at least the reducer and all later started mappers/combiners would find the
updated configuration that would indicate that continuing is pointless (but
some cleanup may still be done by the reducer - for example).
If say a map has failed (due to mapper or combiner), then it can be
monitored via the RunningJob instance itself to decide an action. You
can kill at that point.
Anyway, to cut things short:
a) Is there a (good) way to cancel on ongoing job from within the Job's
implementation?
RunningJob.killJob()
b) Is there a (good) way to update a jobs configuration from within the
Job's implementation
so that the update will be seen by later tasks?
Your reducers would run anyway if you set a configuration (somehow, as
am not sure if that's possible). You'll lose more time compared to
killing the job based on the TaskCompletionEvents given by RunningJob
objects when queried (which you can keep doing, over the
ClientProtocol).
Thanks,
Henning
--
Harsh J
www.harshj.com