FAQ
Sometimes job does not get removed from scheduler queue after it is killed
--------------------------------------------------------------------------

Key: HADOOP-5794
URL: https://issues.apache.org/jira/browse/HADOOP-5794
Project: Hadoop Core
Issue Type: Bug
Components: contrib/capacity-sched
Affects Versions: 0.20.0
Reporter: Karam Singh


Sometimes when we kill a job, it does get removed from waiting queue, while job status: "Killed" with Job Setup and Cleanup: "Successful"
Also JobTracker webui shows job under failed jobs lists and hadoop job -list all, hadoop queue <queuename> -showJobs also shows jobs state=5.
Prior to killing job state was "Running"


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Karam Singh (JIRA) at May 8, 2009 at 1:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707336#action_12707336 ]

    Karam Singh commented on HADOOP-5794:
    -------------------------------------

    Cluster setup - :
    Cluster Capacity = 204 maps, 204 reduces
    4 queues
    Q1 Capacity Percent= 40
    Q2 Capacity Percent= 40
    Q3 Capacity Percent= 40
    Q4 Capacity Percent= 40

    Each queue has user limit=100%
    Submitted 8 jobs to each queue. Total 32 sleep jobs were submitted with each job having maps=10000 (sleep time 5 secs), reduce=2 (sleep time 1 min).
    All jobs were initialized. Out which maps of 4 maps started running. When at least 1000 maps of each job completed, re-started JobTracker.
    After recovery of JobTracker, waited up to the time when 4 jobs got completed. Killed all remaining 28 jobs.
    All jobs got killed successfully.
    JobTracker webui displayed all killed jobs under failed jobs list. hadoop job -list all also displays the status of 28 killed job as 5.
    While browsing through jobqueue_details.jsp pages of queues found that 2 jobs which were killed have not been removed from queue of capacity scheduler. Maps of both jobs were running before kill was sent to them.
    To check that cluster should be blocked because of this, submitted 3 more jobs to each queue where 2 killed were listed and verified the newly submitted jobs ran successfully.
    Waited up to 20 mins before shutting down the cluster

    Sometimes job does not get removed from scheduler queue after it is killed
    --------------------------------------------------------------------------

    Key: HADOOP-5794
    URL: https://issues.apache.org/jira/browse/HADOOP-5794
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Affects Versions: 0.20.0
    Reporter: Karam Singh

    Sometimes when we kill a job, it does get removed from waiting queue, while job status: "Killed" with Job Setup and Cleanup: "Successful"
    Also JobTracker webui shows job under failed jobs lists and hadoop job -list all, hadoop queue <queuename> -showJobs also shows jobs state=5.
    Prior to killing job state was "Running"
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • rahul k singh (JIRA) at May 22, 2009 at 10:16 am
    [ https://issues.apache.org/jira/browse/HADOOP-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712013#action_12712013 ]

    rahul k singh commented on HADOOP-5794:
    ---------------------------------------

    Analysis of the problem:
    When the job tracker is restarted , RecoveryManager tries to recover the job from job history.RecoveryMaanger instantiates the JobInProgress object and sets its startTime as System.currentTimeMillis.In JobInProgress constructor JobStatus startTime is set as JIP's startTime .RecoveryManager fetches startTime information from job history and updates the JIP's startTime(remember this change is not propagated to JobStatus startTime) , hence now Jobstatus has old value of startTime . These Job statuses are used in JobQueuesManager to categorize jobs based on the state they are in. The data structure in JobQueuesManager(waitingJobs) uses startTime as the comparator.As waitingJobs has old startTime value , it has the old entry.
    Whenever we try to do "hadoop job -list" JobTracker's getJobStatus method is called , this sets the JobStatus startTime value with JobInProgress startTime value , now at this point , startTime values in JIP and JobStatus are consistent, but the startTime value in waitingJobs in JobQueueManager is stale . Hence when we try to remove the jobs which are completed(Completed/killed/failed , for example issueing "hadoop job -kill <>" command ) from waitingJobs() nothing is removed as comparator startTime is changed.
    Sometimes job does not get removed from scheduler queue after it is killed
    --------------------------------------------------------------------------

    Key: HADOOP-5794
    URL: https://issues.apache.org/jira/browse/HADOOP-5794
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Affects Versions: 0.20.0
    Reporter: Karam Singh

    Sometimes when we kill a job, it does get removed from waiting queue, while job status: "Killed" with Job Setup and Cleanup: "Successful"
    Also JobTracker webui shows job under failed jobs lists and hadoop job -list all, hadoop queue <queuename> -showJobs also shows jobs state=5.
    Prior to killing job state was "Running"
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vinod K V (JIRA) at May 22, 2009 at 12:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712051#action_12712051 ]

    Vinod K V commented on HADOOP-5794:
    -----------------------------------

    Beautiful! (Sorry couldn't resist myself..)
    Sometimes job does not get removed from scheduler queue after it is killed
    --------------------------------------------------------------------------

    Key: HADOOP-5794
    URL: https://issues.apache.org/jira/browse/HADOOP-5794
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Affects Versions: 0.20.0
    Reporter: Karam Singh

    Sometimes when we kill a job, it does get removed from waiting queue, while job status: "Killed" with Job Setup and Cleanup: "Successful"
    Also JobTracker webui shows job under failed jobs lists and hadoop job -list all, hadoop queue <queuename> -showJobs also shows jobs state=5.
    Prior to killing job state was "Running"
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMay 8, '09 at 1:18p
activeMay 22, '09 at 12:04p
posts4
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Vinod K V (JIRA): 4 posts

People

Translate

site design / logo © 2022 Grokbase