FAQ
Rest API for retrieving job / task statistics
----------------------------------------------

Key: HADOOP-4559
URL: https://issues.apache.org/jira/browse/HADOOP-4559
Project: Hadoop Core
Issue Type: New Feature
Reporter: Florian Leibert
Priority: Trivial
Fix For: 0.20.0


a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Florian Leibert (JIRA) at Oct 31, 2008 at 4:16 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Florian Leibert updated HADOOP-4559:
    ------------------------------------

    Attachment: HADOOP-4559.patch

    This will provide a very simple api that allows to retrieve statistics about the tasks for a given jobid - such as average, min and max times per task, failed tasks per job, total job runtime, etc.
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Florian Leibert (JIRA) at Oct 31, 2008 at 4:18 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Florian Leibert updated HADOOP-4559:
    ------------------------------------

    Release Note: adds api features to the webapp part of hadoop allowing to retrieve task stats for a given job
    Status: Patch Available (was: Open)
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Nov 3, 2008 at 11:15 am
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644689#action_12644689 ]

    Steve Loughran commented on HADOOP-4559:
    ----------------------------------------

    - although its a JSP page, everything, including printing, is done in Java code. It would either be better implemented as a pure servlet, or the output redone as <%= %> operations to produce something more JSP-y

    - I recommend HtmlUnit as the best extension to JUnit for testing web pages; it could grab the pages and look at the content.
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Nov 3, 2008 at 5:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644751#action_12644751 ]

    Hadoop QA commented on HADOOP-4559:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12393159/HADOOP-4559.patch
    against trunk revision 709609.

    +1 @author. The patch does not contain any @author tags.

    -1 tests included. The patch doesn't appear to include any new or modified tests.
    Please justify why no tests are needed for this patch.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/console

    This message is automatically generated.
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Paco Nathan (JIRA) at Nov 3, 2008 at 7:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644792#action_12644792 ]

    pacoid edited comment on HADOOP-4559 at 11/3/08 11:50 AM:
    ---------------------------------------------------------------

    HADOOP-4559 provides a workaround for part of the issue described in HADOOP-3850. Can now access log data by making REST calls to JSP provided in 3850. For example:

    RunningJob currentjob = JobClient.runJob(job_conf);

    JobID id = currentjob.getID();
    String url = "http://localhost:50030/api.jsp?info=jobdetails&id=" + id.getId();

    HttpClient client = new HttpClient();
    HttpMethod method = new GetMethod(url);

    client.executeMethod(method);
    String logData = method.getResponseBodyAsString();
    method.releaseConnection();


    was (Author: pacoid):
    HADOOP-4559 provides a workaround for the issue described in HADOOP-3850. We can now access the log data by making REST calls to JSP provided in 3850.

    RunningJob currentjob = JobClient.runJob(job_conf);
    String urlPrefix = "http://localhost:50030/api.jsp?info=jobdetails&id=";
    final JobID id = currentjob.getID();
    final int id_int = id.getId();
    final String url = urlPrefix + id_int;
    final String json = getStringFromREST(url);

    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Nov 8, 2008 at 12:51 am
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-4559:
    ----------------------------------

    Status: Open (was: Patch Available)

    bq. although its a JSP page, everything, including printing, is done in Java code. It would either be better implemented as a pure servlet
    +1

    * Please format the code according to the [conventions|http://wiki.apache.org/hadoop/HowToContribute#head-59ae13df098fbdcc46abdf980aa8ee76d3ee2e3b].
    * There's a fair amount of dead code in this patch, e.g.
    {noformat}
    + StringBuffer sb = new StringBuffer();
    + boolean isFirst = true;
    + for (String kv : kv_pairs) {
    +
    + sb.append(kv);
    + }
    {noformat}
    {{kv_pairs}} is initialized, but empty. {{sb}} is unused, save in this loop. The loop above it doesn't appear to do any productive work. StringBuilder should be used instead of StringBuffer in this context.
    * If you're proposing this as a public API, it must at least have a unit test.
    * Isn't most of this provided through job history?
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Paco Nathan (JIRA) at Nov 26, 2008 at 1:53 am
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650838#action_12650838 ]

    Paco Nathan commented on HADOOP-4559:
    -------------------------------------
    Isn't most of this provided through job history?
    No, not really. Not if a long-running workflow requires these measurements for automated decisions.

    While a human can *read* the job history data from JSP pages, there's no current means for the app code which calls ToolRunner to obtain that data and use it to alter the workflow.
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Bill de hOra (JIRA) at Nov 28, 2008 at 1:29 am
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651447#action_12651447 ]

    Bill de hOra commented on HADOOP-4559:
    --------------------------------------



    {code}
    JobID id = currentjob.getID();
    String url = "http://localhost:50030/api.jsp?info=jobdetails&id=" + id.getId();
    {code}

    Can't you just call this a JSP into the jobtracker instead? I hate to nitpick, but it's not REST style (client url construction), nor is the response (no links), and ASF code should (imvho) know the difference. If you want to be build REST style tooling around the tracker, I'd be happy to help with that. For example to scale this up to a lot of jobs and/or a lot of clients will require something that doesn't hammer the tracker. And iterating over the tracker seems like a linear bottleneck - O(1) key lookup would be much better.
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Fix For: 0.20.0

    Attachments: HADOOP-4559.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Florian Leibert (JIRA) at Dec 22, 2008 at 3:55 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Florian Leibert updated HADOOP-4559:
    ------------------------------------

    Attachment: (was: HADOOP-4559.patch)
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Attachments: HADOOP-4559v2.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Florian Leibert (JIRA) at Dec 22, 2008 at 3:55 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Florian Leibert updated HADOOP-4559:
    ------------------------------------

    Attachment: HADOOP-4559v2.patch

    the previous version was a bit dirty. I think this one is quite an improvement. We're using it to gather a lot of stats for our job runs. It's not a servlet and doesn' contain HtmlUnit - I think one stats JSP doesn't justify adding another library to the distribution - also for the sake of simplicity this remains a JSP... Hope this is valuable for someone else as well - it really is useful for us to track performance when modifying our algorithm...

    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Attachments: HADOOP-4559v2.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Feb 4, 2009 at 11:21 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670549#action_12670549 ]

    Steve Loughran commented on HADOOP-4559:
    ----------------------------------------

    +1 to Bill's idea for a RESTy API, one that works long-haul.
    Rest API for retrieving job / task statistics
    ----------------------------------------------

    Key: HADOOP-4559
    URL: https://issues.apache.org/jira/browse/HADOOP-4559
    Project: Hadoop Core
    Issue Type: New Feature
    Reporter: Florian Leibert
    Priority: Trivial
    Attachments: HADOOP-4559v2.patch

    Original Estimate: 2h
    Remaining Estimate: 2h

    a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 31, '08 at 3:58p
activeFeb 4, '09 at 11:21p
posts12
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Steve Loughran (JIRA): 12 posts

People

Translate

site design / logo © 2022 Grokbase