FAQ
Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
---------------------------------------------------------------------------------------------

Key: HADOOP-5048
URL: https://issues.apache.org/jira/browse/HADOOP-5048
Project: Hadoop Core
Issue Type: Bug
Reporter: Karam Singh


When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.

Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
I noticed this behavior on 3 different occasions. But is this random, not always reproducible.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Vivek Ratan (JIRA) at Jan 15, 2009 at 9:20 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664053#action_12664053 ]

    Vivek Ratan commented on HADOOP-5048:
    -------------------------------------

    This happens with the Capacity Scheduler.

    Jobs are killed after they are initialized, and before they are run. JoBQueuesManager receives an event for the job's status being changed, and removes it from the run queue. The removal of the job from the wait queue is left to the initialization poller. The latter is unable to remove the job from the wait queue because of the bug in HADOOP-5020. Hence the job remains in the scheduler's wait queue and shows up in the jobqueue_details.jsp page.

    I recommend we do the following:
    * JobQueuesManager should be responsible for removing a job from both the run and wait queue when the job completes. It already does when the job's priority is changed, and so, is already aware that a job can be in both queues and thus needs to be removed from both. With this fix, the job will be removed from the wait queue, regardless of the fix for HADOOP-5020, as the JobQueuesManager receives the job state change event with the old job state.
    * The JobInitializationPoller needs some refactoring. It's really doing two separate things: it builds up a collection of jobs being initialized by walking through the wait queue. Separately, it needs to clean up job objects in its collection by walking through them and removing those jobs which have started running and those that have completed. This makes it responsible for its own collection and the JobQueueManager responsible for its run/wait queues.

    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Reporter: Karam Singh

    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jan 15, 2009 at 9:26 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan updated HADOOP-5048:
    --------------------------------

    Component/s: contrib/capacity-sched
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh

    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Jan 15, 2009 at 10:22 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sreekanth Ramakrishnan reassigned HADOOP-5048:
    ----------------------------------------------

    Assignee: Sreekanth Ramakrishnan
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan

    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Jan 15, 2009 at 10:38 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sreekanth Ramakrishnan updated HADOOP-5048:
    -------------------------------------------

    Attachment: HADOOP-5048-1.patch

    Attaching patch which does following:

    *JobQueueManager*

    * Changed the internal data-structure name from jobList to waitingJobs.
    * Changed method names addJob, removeJob, getJob to reflect the change to include Waiting in method name.
    * Added logic to remove jobs from waiting job queue in job queue manager.

    *JobInitalizationPoller*

    * Changed the initialized job list to accept JobInProgress instead of JobID.
    * Moved the job clean up code into a separate method.
    * Breaking out of the jobs list if the maximum number of jobs for the queue limit has been reached. (Based on offline discussion with Vivek and Amar).

    *TestCapacityScheduler*

    * Changed test case now to test with actual JobInProgress objects
    * Added a new testcase to test conditions of Job removal.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Jan 15, 2009 at 11:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664096#action_12664096 ]

    Amar Kamat commented on HADOOP-5048:
    ------------------------------------

    Have you taken care of the case where _num-maps=0_? See HADOOP-5049 for more details.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 15, 2009 at 5:11 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664172#action_12664172 ]

    Hemanth Yamijala commented on HADOOP-5048:
    ------------------------------------------

    bq. JobQueuesManager should be responsible for removing a job from both the run and wait queue when the job completes.

    While reviewing HADOOP-4513, I had commented on this [here|http://issues.apache.org/jira/browse/HADOOP-4513?focusedCommentId=12648951#action_12648951] as follows:

    bq. I am thinking removal of completed jobs from the 'jobqueue' must also be done in jobCompleted, and not from the poller. This keeps it simple to understand.

    Sreekanth, do you remember why we did not do this in the end ?
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Jan 16, 2009 at 3:43 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664397#action_12664397 ]

    Sreekanth Ramakrishnan commented on HADOOP-5048:
    ------------------------------------------------

    In case of zero tasks assigned to a job. The job's initTask() is called. Then in next polling cycle the list of JobInProgress on which initTasks() has been called is checked and the finished jobs are removed from wait queue.


    With regards to comment on [HADOOP-4513|http://issues.apache.org/jira/browse/HADOOP-4513?focusedCommentId=12648951#action_12648951] the reason why that was not implemented as follows:

    In previous implementation, the maintaining the list of jobs which has been passed to Init-thread workers were done lazily while walking thro' waiting job queue. There was no separate walk done over the list of jobs which has been passed over init-thread. If JobQueuesManager had taken responsibility of removing jobs from waiting queue, then our list will grown indefinitely. This was the reason which it was not done.

    Now, instead we have two walks:

    * Clean up jobs in the list of jobs passed to init-threads.
    * Walk thro' the wait queue until you meet maximum number of jobs to be initialized in a job queue (an optimization so that we need not walk thro' entire wait queue)
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amar Kamat (JIRA) at Jan 16, 2009 at 4:11 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664400#action_12664400 ]

    Amar Kamat commented on HADOOP-5048:
    ------------------------------------

    bq. In case of zero tasks assigned to a job. The job's initTask() is called. Then in next polling cycle the list of JobInProgress on which initTasks() has been called is checked and the finished jobs are removed from wait queue.
    What happens to the jobs in the wait-queue? I guess in this jira we expect that to happen via the job-state-change-event which wont happen, no?
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Jan 16, 2009 at 10:40 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sreekanth Ramakrishnan updated HADOOP-5048:
    -------------------------------------------

    Attachment: HADOOP-5048-2.patch

    Attaching latest patch with incremental changes from previous patch incorporating Vivek's offline comments:

    * The cleaning up of the initialized jobs list in poller is done at the top level, so code can look more clean.
    * Added and modified comments in select job to initalize so that there is a pseudo-code which can be read to find out what is being done in the code.
    * Added comments on methods which are present in JobInitalizationPoller.
    * Renamed getJobs in JobQueueManager to getWaitingJobs.
    * Modified ControlledJobInitializationPoller to accomodate new cleanup method.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jan 19, 2009 at 4:38 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665038#action_12665038 ]

    Vivek Ratan commented on HADOOP-5048:
    -------------------------------------

    Looks fine, Sreekanth.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Jan 19, 2009 at 5:57 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sreekanth Ramakrishnan updated HADOOP-5048:
    -------------------------------------------

    Status: Patch Available (was: Open)
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 19, 2009 at 8:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665222#action_12665222 ]

    Hemanth Yamijala commented on HADOOP-5048:
    ------------------------------------------

    A few comments:

    - I feel it would be better to make initializedJobs a Map of jobid to JobInProgress. What worries me is that there's no equals or hashcode defined on the JobInProgress, and we want to lookup only by JobId for semantic reasons. Elsewhere, for e.g. in the JobTracker, if we want to lookup JobInProgress objects, we have a map of jobid to JobInProgress. It would be good to retain the same model.
    - In the javadocs for getJobsToInitialize, where 'n' is introduced, it would be nice to mention that the computation of 'n' is shown later. I originally thought n was a parameter to the method. In that sense, it was a little confusing.
    - Also, it mentions that the method 'picks' first n jobs. However, if the job is already initialized but not running, we don't pick it again. So, maybe saying it 'looks at' first n jobs is a better phrase.
    - The pseudo code written as comments in the method is going to be hard to maintain, and as such was hard for me to even understand. It is also incorrect in a few details. For e.g. it says, if job is found in initialized job list: noOfUsers++. What if there are 2 jobs for the same user that are initialized. The code doesn't even have a variable for noOfUsers. It is much easier to follow code if the comments are inline with the code. A summary of the algorithm would be good if it is required (in this case, I don't think it is), but not the entire pseudo code at any rate.
    - cleanupInitializedJobList has a redundant continue in the if condition checking for job status == Running.
    - testJobRemovals can be improved. Once again, I feel a summary and not the complete steps in pseudo code makes the comment readable. I think it must only focus on removals, and not on the state of running jobs. Note that testJobMovement already tests the movement of jobs across the queues. I would recommend that the test does the following:
    -- initialize, run and complete a job and make sure it is removed from both lists.
    -- initialize and kill a job, make sure it is removed from both lists (this condition verifies the actual bug fix)
    -- kill a job before initialization, make sure it is removed from the waiting list.
    And it is more pertinent to check that the job is not present, rather than check only for the counts.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jan 20, 2009 at 2:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665319#action_12665319 ]

    Vivek Ratan commented on HADOOP-5048:
    -------------------------------------

    bq. The pseudo code written as comments in the method is going to be hard to maintain

    I had asked Sreekanth to put the algorithm logic in the comments. I feel it's important to do that because the actual code can be a bit different. You may combine some if statements, or move code around to make it better performing, perhaps, and this often leads to harder readability. The comments indicate what the logic is like, the code may implement it differently. The comments helped me understand what was going on, much more than the code (and accompanying logic) did. That said, the 'pseudo code' or comments need to be accurate. Perhaps they can be reworded a bit to read better. Let me work with Sreekanth to make them better.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 20, 2009 at 3:13 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665321#action_12665321 ]

    Hemanth Yamijala commented on HADOOP-5048:
    ------------------------------------------

    bq. The comments indicate what the logic is like, the code may implement it differently.

    Vivek, if by this statement you mean that you want to give a gist of the algorithm, that's fine. But the way it is commented right now, it does look like the actual code (declaring variables etc) and is very verbose. Given such a level of verbosity, I started looking at code that matches the comments. I guess there are others who would do so too. Also, when one wants to change code in this area, they would look at the comments and the code, both. If they don't match, IMHO, it will confuse more than clarify, even if only the implementation has changed.

    If you feel the inline comments that go with the code will not be good enough, I would recommend we give a gist of the algorithm that explains things. That way, we don't have to change it for every line or branch we touch as long as the summary remains the same. And it would be more explanatory as you would like.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Jan 20, 2009 at 4:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sreekanth Ramakrishnan updated HADOOP-5048:
    -------------------------------------------

    Attachment: HADOOP-5048-3.patch

    Attaching new patch incorporating, Hemanth's comments and Viveks comments:

    * Changed initalizedJobs to HashMap to map of job id to job in progress.
    * Changed comments in getJobsToInitalize.
    * Changed testJobMovementTestCase, the test case now tests for 4 different kind of job movements which are possible :
    ** Submission and completion of the job alongwith removal from waiting and running queue at correct times.
    ** Submission and failure of the running job, alongwith appropriate movement and removals.
    ** Submission and failure of an initialized job but not scheduled alongwith appropriate removals.
    ** Submission and failure of a waiting job alongwith appropriate removal.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch, HADOOP-5048-3.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 20, 2009 at 6:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665494#action_12665494 ]

    Hemanth Yamijala commented on HADOOP-5048:
    ------------------------------------------

    Looks good. +1.

    I ran ant test-patch, which gave the following results:

    [exec] +1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 3 new or modified tests.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    Capacity scheduler tests also passed. So, this one's ready to go.
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch, HADOOP-5048-3.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hemanth Yamijala (JIRA) at Jan 20, 2009 at 7:23 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hemanth Yamijala updated HADOOP-5048:
    -------------------------------------

    Resolution: Fixed
    Fix Version/s: 0.20.0
    Hadoop Flags: [Reviewed]
    Status: Resolved (was: Patch Available)

    I just committed this to trunk and branch 0.20, as it was a regression. Thanks, Sreekanth !
    Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
    ---------------------------------------------------------------------------------------------

    Key: HADOOP-5048
    URL: https://issues.apache.org/jira/browse/HADOOP-5048
    Project: Hadoop Core
    Issue Type: Bug
    Components: contrib/capacity-sched
    Reporter: Karam Singh
    Assignee: Sreekanth Ramakrishnan
    Fix For: 0.20.0

    Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch, HADOOP-5048-3.patch


    When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
    When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
    Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
    I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJan 15, '09 at 8:50a
activeJan 20, '09 at 7:23p
posts18
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hemanth Yamijala (JIRA): 18 posts

People

Translate

site design / logo © 2022 Grokbase