Hi Hadoop users,
In my company we have been using Hadoop for 2 years and we have need to
pause and resume map reduce jobs. I was searching on Hadoop JIRA and there
are couple of tickets which are not resolved. So we have implemented our
solution. I would like to share this approach with you and to hear your
opinion about it.
We have created one special pool in fair scheduler called PAUSE
(maxMapTasks = 0, maxReduceTasks = 0). Our logic for pausing job is to move
it into this pool and kill all running tasks. When we want to resume job we
move this job into some other pool. Currently we can do maintenance of
cloud except Job Tracker while jobs are paused. Also we have some external
services which we use and we are doing their maintenance while jobs are
We know that records which are processed by running tasks will be
reprocessed. In some cases we use same HBase table as input and output and
we save job id on record. When record is re-processes we check this job id
and skip record if it is processed by same job.
Our custom implementation of fair scheduler have this logic implemented and
it is deployed to our cluster.
Please share your comments and concerns about this approach