Hi Hadoop users,

In my company we have been using Hadoop for 2 years and we have need to
pause and resume map reduce jobs. I was searching on Hadoop JIRA and there
are couple of tickets which are not resolved. So we have implemented our
solution. I would like to share this approach with you and to hear your
opinion about it.

We have created one special pool in fair scheduler called PAUSE
(maxMapTasks = 0, maxReduceTasks = 0). Our logic for pausing job is to move
it into this pool and kill all running tasks. When we want to resume job we
move this job into some other pool. Currently we can do maintenance of
cloud except Job Tracker while jobs are paused. Also we have some external
services which we use and we are doing their maintenance while jobs are
paused.

We know that records which are processed by running tasks will be
reprocessed. In some cases we use same HBase table as input and output and
we save job id on record. When record is re-processes we check this job id
and skip record if it is processed by same job.

Our custom implementation of fair scheduler have this logic implemented and
it is deployed to our cluster.

Please share your comments and concerns about this approach

Regards,
dino

Search Discussions

  • Arun C Murthy at Dec 13, 2011 at 1:40 am
    The CapacityScheduler (hadoop-0.20.203 onwards) allows you to stop a queue and start it again.

    That will give you the behavior you described.

    Arun
    On Dec 12, 2011, at 5:50 AM, Dino Kečo wrote:

    Hi Hadoop users,

    In my company we have been using Hadoop for 2 years and we have need to pause and resume map reduce jobs. I was searching on Hadoop JIRA and there are couple of tickets which are not resolved. So we have implemented our solution. I would like to share this approach with you and to hear your opinion about it.

    We have created one special pool in fair scheduler called PAUSE (maxMapTasks = 0, maxReduceTasks = 0). Our logic for pausing job is to move it into this pool and kill all running tasks. When we want to resume job we move this job into some other pool. Currently we can do maintenance of cloud except Job Tracker while jobs are paused. Also we have some external services which we use and we are doing their maintenance while jobs are paused.

    We know that records which are processed by running tasks will be reprocessed. In some cases we use same HBase table as input and output and we save job id on record. When record is re-processes we check this job id and skip record if it is processed by same job.

    Our custom implementation of fair scheduler have this logic implemented and it is deployed to our cluster.

    Please share your comments and concerns about this approach

    Regards,
    dino

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedDec 13, '11 at 12:44a
activeDec 13, '11 at 1:40a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Dino Kečo: 1 post Arun C Murthy: 1 post

People

Translate

site design / logo © 2022 Grokbase