FAQ
I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting until it completes with RunningJob.waitForCompletion(). I then want to get how long the entire MR takes, which appears to need the JobStatus since RunningJob doesn't provide anything I can use for that. The only way I can see how to do it right now is JobClient.getAllJobs(), which gives me an array of all the jobs that are submitted (currently running? all previous?). Anyone know how I could go about doing this?

--Aaron

Search Discussions

  • Madhu phatak at Feb 17, 2011 at 6:35 am
    Rather than running jobs by wait for completion you can use jobcontrol to
    control the jobs . JobControl give access to the what all jobs are completed
    ,running and failed etc
    On Thu, Feb 17, 2011 at 12:09 AM, Aaron Baff wrote:

    I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting
    until it completes with RunningJob.waitForCompletion(). I then want to get
    how long the entire MR takes, which appears to need the JobStatus since
    RunningJob doesn't provide anything I can use for that. The only way I can
    see how to do it right now is JobClient.getAllJobs(), which gives me an
    array of all the jobs that are submitted (currently running? all previous?).
    Anyone know how I could go about doing this?

    --Aaron
  • Aaron Baff at Feb 17, 2011 at 6:15 pm

    From: madhu phatak

    Rather than running jobs by wait for completion you can use jobcontrol to
    control the jobs . JobControl give access to the what all jobs are completed
    ,running and failed etc
    This is almost what I want, but that doesn't give me access to the data I'm looking for. I'm specifically looking at org.apache.hadoop.mapreduce.JobStatus and it's getStartTime() and getFinishTime() methods. The only place I've seen to get a JobStatus object is the JobClient getAllJobs(), getJobsFromQueue(), and jobsToComplete().

    --Aaron
    On Thu, Feb 17, 2011 at 12:09 AM, Aaron Baff wrote:

    I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting
    until it completes with RunningJob.waitForCompletion(). I then want to > get
    how long the entire MR takes, which appears to need the JobStatus since
    RunningJob doesn't provide anything I can use for that. The only way I can
    see how to do it right now is JobClient.getAllJobs(), which gives me an
    array of all the jobs that are submitted (currently running? all previous?).
    Anyone know how I could go about doing this?

    --Aaron
  • Harsh J at Feb 18, 2011 at 2:55 am
    Hello,
    On Thu, Feb 17, 2011 at 12:09 AM, Aaron Baff wrote:
    I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting until it completes with RunningJob.waitForCompletion(). I then want to get how long the entire MR takes, which appears to need the JobStatus since RunningJob doesn't provide anything I can use for that. The only way I can see how to do it right now is JobClient.getAllJobs(), which gives me an array of all the jobs that are submitted (currently running? all previous?). Anyone know how I could go about doing this?
    The mapreduce.Cluster class in the current release can give you a
    'Job' object provided a JobID is known. The Job class also has the
    information you seek for a particular job (start/finish times and
    more).

    JobClient -> JobStatus results would be out of what the JT carries in
    its memory at the time of call.

    --
    Harsh J
    www.harshj.com
  • Aaron Baff at Feb 18, 2011 at 4:35 pm

    On Thu, Feb 17, 2011 at 12:09 AM, Aaron Baff wrote:
    I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting until it completes with RunningJob.waitForCompletion(). I then want to get how long the entire MR takes, which appears to need the JobStatus since RunningJob doesn't provide anything I can use for that. The only way I can see how to do it right now is JobClient.getAllJobs(), which gives me an array of all the jobs that are submitted (currently running? all previous?). Anyone know how I could go about doing this?
    The mapreduce.Cluster class in the current release can give you a
    'Job' object provided a JobID is known. The Job class also has the
    information you seek for a particular job (start/finish times and
    more).
    JobClient -> JobStatus results would be out of what the JT carries in
    its memory at the time of call.

    Thanks Harsh, yes, that is exactly what I was looking for. Some of the documentation and ton of classes that all seem very similar to each other make for a confusing situation sometime.

    The other issue I'm running into now is how to get the output path using FileOutputFormat.getOutputPath(). However, the property that that looks for does not seem to exist within the job config. When I grab the configuration from the Job, it's toString() shows that it has an HDFS path to a job config xml file, but when I look using the CLI fs client, that file does not exist! Nor does any other output for any job I've run recently! On the jobtracker web interface, I can view it, although that appears to be viewed from the jobtracker, and not hdfs. This is getting very perplexing, unless I missed reading some bit of documentation that makes this all clear.

    --Aaron

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 16, '11 at 6:40p
activeFeb 18, '11 at 4:35p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase