FAQ
Hi -

What's the best way to list and query information on Hadoop job histories?
For example, I'd like to see the job names from the past week against a
Hadoop cluster I'm using. I don't see an API call or a way through the
command line to pull the information. Is the best way writing a quick
script to process the job history files?

Thanks.
Scott

Search Discussions

  • Doug Balog at Aug 12, 2010 at 4:23 am
    I don't know if this is the best way, but this is how I do it.

    Configuration conf = new Configuration();
    JobClient jobClient = new JobClient(new InetSocketAddress("jobTracker",9001),conf);
    jobClient.setConf(conf); // Bug in constructor, doesn't set conf.

    for(JobStatus js: jobClient.getAllJobs()){
    // We only care about completed jobs.
    if(!js.isJobComplete()){
    continue;
    }
    // Do stuff on jobStatus.
    :
    :
    }

    You can also scrape info from http://jobtracker:50030/jobhistory.jsp

    Or read it from the job's outputDir/_log/ directory.

    Cheers,

    Doug

    On Aug 11, 2010, at 11:54 AM, Scott Whitecross wrote:

    Hi -

    What's the best way to list and query information on Hadoop job histories?
    For example, I'd like to see the job names from the past week against a
    Hadoop cluster I'm using. I don't see an API call or a way through the
    command line to pull the information. Is the best way writing a quick
    script to process the job history files?

    Thanks.
    Scott
  • Arun C Murthy at Aug 12, 2010 at 4:53 am
    Moving to mapreduce-user@, bcc general@.

    There isn't a direct way. One possible option is just use the per-job
    job-history file which is on HDFS (See http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoring
    for info on job-history).

    Hope that helps.

    Arun
    On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:

    Hi -

    What's the best way to list and query information on Hadoop job
    histories?
    For example, I'd like to see the job names from the past week
    against a
    Hadoop cluster I'm using. I don't see an API call or a way through
    the
    command line to pull the information. Is the best way writing a quick
    script to process the job history files?

    Thanks.
    Scott
  • Scott Whitecross at Aug 17, 2010 at 2:06 am
    Thanks for the answers Doug and Arun. I'm assuming the job-history files
    mentioned are in ./hadoop-0.20/logs/history/done/. The files look like they
    were serialized by a class in Hadoop? (If I can read the files back into
    the appropriate class, and then dump them out into a custom format, that'd
    be great.)

    Thanks.


    On Thu, Aug 12, 2010 at 12:52 AM, Arun C Murthy wrote:

    Moving to mapreduce-user@, bcc general@.

    There isn't a direct way. One possible option is just use the per-job
    job-history file which is on HDFS (See
    http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoringfor info on job-history).

    Hope that helps.

    Arun


    On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:

    Hi -
    What's the best way to list and query information on Hadoop job histories?
    For example, I'd like to see the job names from the past week against a
    Hadoop cluster I'm using. I don't see an API call or a way through the
    command line to pull the information. Is the best way writing a quick
    script to process the job history files?

    Thanks.
    Scott
  • Ranjit Mathew at Aug 17, 2010 at 4:30 am
    [BCC-ing "general" - again.]
    On Tuesday 17 August 2010 07:36 AM, Scott Whitecross wrote:
    Thanks for the answers Doug and Arun. I'm assuming the job-history files
    mentioned are in ./hadoop-0.20/logs/history/done/. The files look like they
    were serialized by a class in Hadoop? (If I can read the files back into
    the appropriate class, and then dump them out into a custom format, that'd
    be great.)
    Rumen (src/tools/org/apache/hadoop/tools/rumen/) parses Job History files
    and creates JSON files that can be either be loaded independently, or via
    the API provided by Rumen itself. As an added benefit, it abstracts away
    the differences between the 0.20.xx format and the Avro-based format used
    in trunk.

    There is not much documentation on Rumen right now, but MAPREDUCE-1918
    (https://issues.apache.org/jira/browse/MAPREDUCE-1918) attempts to fix
    that.

    HTH,
    Ranjit
    On Thu, Aug 12, 2010 at 12:52 AM, Arun C Murthywrote:
    Moving to mapreduce-user@, bcc general@.

    There isn't a direct way. One possible option is just use the per-job
    job-history file which is on HDFS (See
    http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoringfor info on job-history).

    Hope that helps.

    Arun


    On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:

    Hi -
    What's the best way to list and query information on Hadoop job histories?
    For example, I'd like to see the job names from the past week against a
    Hadoop cluster I'm using. I don't see an API call or a way through the
    command line to pull the information. Is the best way writing a quick
    script to process the job history files?

    Thanks.
    Scott

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgeneral @
categorieshadoop
postedAug 11, '10 at 4:56p
activeAug 17, '10 at 4:30a
posts5
users4
websitehadoop.apache.org
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase