FAQ
Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
----------------------------------------------------------------------------------------------------

Key: HADOOP-481
URL: http://issues.apache.org/jira/browse/HADOOP-481
Project: Hadoop
Issue Type: Improvement
Components: mapred, metrics
Affects Versions: 0.6.0
Reporter: Milind Bhandarkar
Assigned To: Milind Bhandarkar
Priority: Minor
Fix For: 0.6.0


Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Milind Bhandarkar (JIRA) at Aug 24, 2006 at 11:43 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Attachment: reports.patch

    Patch attached. It changes the TaskUmbilicalProtocol (progress method has additional parameter) and InterTrackerProtocol( TaskTrackerStatus has an added field), therefore their versions have been bumped up.
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 24, 2006 at 11:43 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Status: Patch Available (was: Open)

    Patch attached. It changes the TaskUmbilicalProtocol (progress method has additional parameter) and InterTrackerProtocol( TaskTrackerStatus has an added field), therefore their versions have been bumped up.
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 24, 2006 at 11:43 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Attachment: (was: reports.patch)
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 25, 2006 at 12:13 am
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Status: In Progress (was: Patch Available)
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0

    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 25, 2006 at 12:13 am
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Attachment: reports.patch
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0

    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Aug 28, 2006 at 8:03 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=comments#action_12431048 ]

    Doug Cutting commented on HADOOP-481:
    -------------------------------------

    Instead of passing a long[] you should pass a struct that implements Writable. Probably TaskMetrics would be a good name for this.
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0

    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 28, 2006 at 8:16 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=comments#action_12431052 ]

    Milind Bhandarkar commented on HADOOP-481:
    ------------------------------------------

    Allright.I will resubmit the patch in a day or two.

    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0

    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 31, 2006 at 11:37 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Attachment: (was: reports.patch)
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 31, 2006 at 11:41 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Status: Patch Available (was: In Progress)

    patch submitted.
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0

    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Milind Bhandarkar (JIRA) at Aug 31, 2006 at 11:42 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Milind Bhandarkar updated HADOOP-481:
    -------------------------------------

    Attachment: reports.patch

    Here is the updated patch.
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0

    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Sep 5, 2006 at 6:32 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=comments#action_12432647 ]

    Doug Cutting commented on HADOOP-481:
    -------------------------------------

    Shouldn't we use our existing metrics API for stuff like this? As with HADOOP-492, it seems like the TaskTracker and JobTracker should implement the MetricsContext API, providing a MetricsRecord factory. These can be used by the MapReduce kernel code for the metrics desired here, and in supplied to user code the uses in HADOOP-492.

    We might even need to write a multiplexing MetricsContext, that can send metrics to both the JobTracker and to, e.g., Ganglia. But we should not be adding new metrics APIs when we already have one. If the current metrics API is somehow inappropriate, let's fix that instead of create another.
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Fix For: 0.6.0

    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Nov 3, 2006 at 7:26 pm
    [ http://issues.apache.org/jira/browse/HADOOP-481?page=all ]

    Doug Cutting updated HADOOP-481:
    --------------------------------

    Status: Open (was: Patch Available)
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: http://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Nigel Daley (JIRA) at Feb 9, 2007 at 10:12 pm
    [ https://issues.apache.org/jira/browse/HADOOP-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471847 ]

    Nigel Daley commented on HADOOP-481:
    ------------------------------------

    FWIW, HADOOP-954 has changed all task related statistics to be per user (the user that submitted the job). Currently, these counters are aggregated per user:

    map_input_bytes
    map_input_records
    map_output_bytes
    map_output_records
    shuffle_input_bytes
    reduce_input_records
    reduce_output_records

    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: https://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assigned To: Milind Bhandarkar
    Priority: Minor
    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Milind Bhandarkar (JIRA) at Sep 18, 2007 at 7:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Milind Bhandarkar resolved HADOOP-481.
    --------------------------------------

    Resolution: Invalid

    This also is already done.
    Hadoop mapred metrics should include per job input/output statistics rather than per-task statistics
    ----------------------------------------------------------------------------------------------------

    Key: HADOOP-481
    URL: https://issues.apache.org/jira/browse/HADOOP-481
    Project: Hadoop
    Issue Type: Improvement
    Components: mapred, metrics
    Affects Versions: 0.6.0
    Reporter: Milind Bhandarkar
    Assignee: Milind Bhandarkar
    Priority: Minor
    Attachments: reports.patch


    Currently hadoop reports metrics such as input bytes, input records, etc on per-task basis. Accurate aggregation of these metrics is required at the job-level and reporting should be done on a per-job basis.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedAug 24, '06 at 11:43p
activeSep 18, '07 at 7:40p
posts15
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Milind Bhandarkar (JIRA): 15 posts

People

Translate

site design / logo © 2022 Grokbase