[
https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665222#action_12665222 ]
Hemanth Yamijala commented on HADOOP-5048:
------------------------------------------
A few comments:
- I feel it would be better to make initializedJobs a Map of jobid to JobInProgress. What worries me is that there's no equals or hashcode defined on the JobInProgress, and we want to lookup only by JobId for semantic reasons. Elsewhere, for e.g. in the JobTracker, if we want to lookup JobInProgress objects, we have a map of jobid to JobInProgress. It would be good to retain the same model.
- In the javadocs for getJobsToInitialize, where 'n' is introduced, it would be nice to mention that the computation of 'n' is shown later. I originally thought n was a parameter to the method. In that sense, it was a little confusing.
- Also, it mentions that the method 'picks' first n jobs. However, if the job is already initialized but not running, we don't pick it again. So, maybe saying it 'looks at' first n jobs is a better phrase.
- The pseudo code written as comments in the method is going to be hard to maintain, and as such was hard for me to even understand. It is also incorrect in a few details. For e.g. it says, if job is found in initialized job list: noOfUsers++. What if there are 2 jobs for the same user that are initialized. The code doesn't even have a variable for noOfUsers. It is much easier to follow code if the comments are inline with the code. A summary of the algorithm would be good if it is required (in this case, I don't think it is), but not the entire pseudo code at any rate.
- cleanupInitializedJobList has a redundant continue in the if condition checking for job status == Running.
- testJobRemovals can be improved. Once again, I feel a summary and not the complete steps in pseudo code makes the comment readable. I think it must only focus on removals, and not on the state of running jobs. Note that testJobMovement already tests the movement of jobs across the queues. I would recommend that the test does the following:
-- initialize, run and complete a job and make sure it is removed from both lists.
-- initialize and kill a job, make sure it is removed from both lists (this condition verifies the actual bug fix)
-- kill a job before initialization, make sure it is removed from the waiting list.
And it is more pertinent to check that the job is not present, rather than check only for the counts.
Sometimes job is still displayed in jobqueue_details page for long time after job was killed.
---------------------------------------------------------------------------------------------
Key: HADOOP-5048
URL:
https://issues.apache.org/jira/browse/HADOOP-5048Project: Hadoop Core
Issue Type: Bug
Components: contrib/capacity-sched
Reporter: Karam Singh
Assignee: Sreekanth Ramakrishnan
Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch
When I tried kill all running job, I noticed that were two jobs were listed on jobqueue_details.jsp page page as well as they were also listed under failed job on jobtracker.jsp page.
When I checked status of each that was displayed "killed" and Cleanup task status as "Successful", but both jobs were also being on jobqueue_details.jsp page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
Before killing the jobs, status of both jobs was running and no task of from them was scheduled.
I noticed this behavior on 3 different occasions. But is this random, not always reproducible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.