FAQ
Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
------------------------------------------------------------------------------------------------------------------------------------

Key: HADOOP-3062
URL: https://issues.apache.org/jira/browse/HADOOP-3062
Project: Hadoop Core
Issue Type: Improvement
Components: metrics
Reporter: Runping Qi



In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
bandwidth is the bottleneck when certain jobs are running on a cluster.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Chris Douglas (JIRA) at Jul 31, 2008 at 10:59 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618896#action_12618896 ]

    Chris Douglas commented on HADOOP-3062:
    ---------------------------------------

    The analysis should leverage HADOOP-3719, so this issue should cover the log4j appender emitting the HDFS and shuffling data. There are a few open questions and arguable assumptions:

    * Should this count bytes successfully transferred separately from failed transfers? Should failed transfers be logged at all?
    * The header/metadata/etc. traffic is assumed to be a negligible fraction of the total network traffic and irrelevant to the analysis for a particular job. The overall network utilization is also best measured using standard monitoring utilities that don't require any knowledge of Hadoop. This will focus on tracking block traffic over HDFS (reads, writes, replications) and map output fetched during the shuffle, only.
    * For local reads, the source and destination IP will match. This should be sufficient to detect and discard during analysis of network traffic, but will not be sufficient to account for all reads from the local disk (counters and job history are likely better tools for this).
    * Accounting for topology (to break down by racks, etc.) is best deferred to the analysis. Logging changes in topology would also be helpful, though I don't know whether Hadoop has sufficient information to do this in the general case.
    * If job information is available (in the shuffle), should it be included in the entry? Doing this for HDFS is non-trivial, but would be invaluable to the analysis. I'm not certain how to do this, yet. Of course, replications and rebalancing won't include this, and HDFS reads prior to job submission (and all other traffic from JobClient) will likely be orphaned, as well.
    * Should this include start/end entries so one can infer how long the transfer took?
    * What about DistributedCache? Can it be ignored as part of the job setup, which is already omitted?

    In general, the format will follow:
    {noformat}
    <log4j schema including timestamp, etc.> source: <src IP>, destination: <dst IP>, bytes: <bytes>, operation: <op enum>[, taskid: <TaskID>]
    {noformat}

    Where {{<(src|dst) IP>}} is the IP address of the source and destination nodes, {{<bytes>}} is a long, and {{<op enum>}} is one of {{HDFS_READ}}, {{HDFS_WRITE}}, {{HDFS_COPY}}, and {{MAPRED_SHUFFLE}}. {{HDFS_REPLACE}} should be redundant if {{HDFS_COPY}} is recorded (I think). The rebalancing traffic isn't relevant to job analysis, but if one is including sufficient information to determine the duration of each transfer it may be interesting. The TaskID should be sufficient, but one could argue that including the JobID would be useful as a point to join on.

    Thoughts?
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi

    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 7, 2008 at 2:46 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Attachment: 3062-0.patch

    First draft.

    Format:
    {noformat}
    <log4j schema including timestamp, etc.> src: <src IP>, dest: <dst IP>, bytes: <bytes>, op: <op enum>, id: <DFSClient id|taskid>[, blockid: <block id>]
    {noformat}

    The patch adds the DFSClient clientName to OP_READ_BLOCK and changes the String in OP_WRITE_BLOCK from the path- which is unused- to the clientName. Is this is set to DFSClient_<taskid> in map and reduce tasks, tracing the output of a job should be straightforward after some processing of each entry. Writes for replications (where the clientName is "") are logged as they have been; the logging in PacketResponder has been reformatted to fit the preceding schema. A few known issues:

    * The logging assumes the IP address is sufficient to distinguish a source, particularly for writes and in the shuffle
    * This logs to the DataNode and ReduceTask appenders; these entries should be directed elsewhere and disabled by default
    * In testing this, some entries in the read exhibited a strange property: the source and destination match, but neither matches the DataNode on which it is logged. I'm clearly missing something.

    I tried tracing a few blocks and map outputs through the logs and all made sense. That said- as mentioned in the last bullet- not all of the entries made sense.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Attachments: 3062-0.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 7, 2008 at 2:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Fix Version/s: 0.19.0
    Hadoop Flags: [Incompatible change]
    Status: Patch Available (was: Open)
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Fix For: 0.19.0

    Attachments: 3062-0.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Aug 7, 2008 at 4:16 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620503#action_12620503 ]

    Hadoop QA commented on HADOOP-3062:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12387698/3062-0.patch
    against trunk revision 683448.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 3 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3029/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3029/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3029/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3029/console

    This message is automatically generated.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Fix For: 0.19.0

    Attachments: 3062-0.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Lohit Vijayarenu (JIRA) at Aug 7, 2008 at 8:09 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620731#action_12620731 ]

    Lohit Vijayarenu commented on HADOOP-3062:
    ------------------------------------------

    For this
    bq. and break them down by racks
    Is there any information logged about this?
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Fix For: 0.19.0

    Attachments: 3062-0.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 8, 2008 at 12:05 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620791#action_12620791 ]

    Chris Douglas commented on HADOOP-3062:
    ---------------------------------------

    bq. Is there any information logged about [breakdown by racks]?

    No, that's handled in the analysis. I don't think the datanodes or the reduce tasks know about topology, anyway.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Fix For: 0.19.0

    Attachments: 3062-0.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 14, 2008 at 11:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Status: Open (was: Patch Available)

    Patch was mauled by HADOOP-3935 and the second and third (HADOOP-3658) bullets should be addressed.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Fix For: 0.19.0

    Attachments: 3062-0.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 15, 2008 at 3:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Attachment: 3062-1.patch
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 18, 2008 at 6:46 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Status: Patch Available (was: Open)

    Verified results with a randomwriter/sort run
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 18, 2008 at 6:48 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas reassigned HADOOP-3062:
    -------------------------------------

    Assignee: Chris Douglas
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 18, 2008 at 10:32 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623480#action_12623480 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3062:
    ------------------------------------------------

    - Should we check whether ClientTraceLog.isInfoEnabled() before logging?

    - Should we define an AUDIT_FORMAT for the log messages, like FSNamesystem.AUDIT_FORMAT?

    - I think it might worth to create a utility class, say org.apache.hadoop.log.AuditLog, so that we could put AUDIT_FORMAT, isInfoEnabled(), etc. inside it. Then, both DataNode and FSNamesystem can use it.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 18, 2008 at 11:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623482#action_12623482 ]

    Chris Douglas commented on HADOOP-3062:
    ---------------------------------------

    bq. Should we check whether ClientTraceLog.isInfoEnabled() before logging?

    Excluding the string concatenation to produce the actual, the cost of each log message is low or infrequent (like the shuffle message). Excluding the new read log message, it's comparable to the logging that's already happening. I'm not certain if the logging this replaces (for client writes) should occur when ClientTraceLog.inInfoEnabled() is false, since nothing would be logged in that case...

    bq. Should we define an AUDIT_FORMAT for the log messages, like FSNamesystem.AUDIT_FORMAT?

    Unlike the FSNamesystem audit format, these are going to require some additional processing to be useful (e.g. the id param, optional block id), so the key/value pairing doesn't offer the same syntactical guarantees. That said, you're probably right, but unless we adopt a packaging like what you suggest in your following point, we'd introduce a link between hdfs and mapred. For now- with only these few messages- I don't think it gains much by being pulled out.

    bq. I think it might worth to create a utility class, say org.apache.hadoop.log.AuditLog, so that we could put AUDIT_FORMAT, isInfoEnabled(), etc. inside it. Then, both DataNode and FSNamesystem can use it.

    Agreed: it would be better if there were a more central location for Hadoop APIs exported through the logging interfaces, like audit logs and these metrics. If nothing else, it would let us know which messages have consumers (hence the uncertainty for logging client writes). That's likely part of a different patch, though.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 18, 2008 at 11:48 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3062:
    -------------------------------------------

    Description:
    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.


    was:

    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.


    Hadoop Flags: [Incompatible change, Reviewed] (was: [Incompatible change])

    Got it. Let's work on a utility class in future if there is a need.

    +1 the patch is good.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 19, 2008 at 1:46 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Attachment: 3062-2.patch

    Updated based on Nicholas's feedback, i.e. added {{isInfoEnabled}} guards around appropriate log stmts. Also removed the irrelevant replication log message.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 19, 2008 at 1:46 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623533#action_12623533 ]

    chris.douglas edited comment on HADOOP-3062 at 8/18/08 6:44 PM:
    ----------------------------------------------------------------

    Updated based on Nicholas's feedback, i.e. added {{isInfoEnabled}} guards around appropriate log stmts. Also removed the irrelevant replication log message. I'll commit this if Hudson doesn't object.

    was (Author: chris.douglas):
    Updated based on Nicholas's feedback, i.e. added {{isInfoEnabled}} guards around appropriate log stmts. Also removed the irrelevant replication log message.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 20, 2008 at 2:11 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Attachment: 3062-3.patch

    * Moved mapred logging from ReduceTask to the TaskTracker
    * Changed HDFS_READ logging to record bytes actually read from datanode rather than bytes requested
    * Put \*.clienttrace format into TaskTracker, DataNode
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch, 3062-3.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 21, 2008 at 12:31 am
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Attachment: 3062-4.patch

    * Added storageID to datanode string
    * Replaced redundant log message

    This probably needs only one more pass in review.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch, 3062-3.patch, 3062-4.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Aug 21, 2008 at 6:59 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624451#action_12624451 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3062:
    ------------------------------------------------

    +1 new patch looks good.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch, 3062-3.patch, 3062-4.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 21, 2008 at 9:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Attachment: 3062-5.patch

    {noformat}
    [exec] -1 overall.

    [exec] +1 @author. The patch does not contain any @author tags.

    [exec] +1 tests included. The patch appears to include 3 new or modified tests.

    [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    {noformat}

    Fixed a findbugs warning, javadoc remains unrelated, and passes unit tests.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch, 3062-3.patch, 3062-4.patch, 3062-5.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Aug 21, 2008 at 9:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-3062:
    ----------------------------------

    Resolution: Fixed
    Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed, Incompatible change])
    Status: Resolved (was: Patch Available)

    I just committed this.
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch, 3062-3.patch, 3062-4.patch, 3062-5.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Aug 22, 2008 at 12:41 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624778#action_12624778 ]

    Hudson commented on HADOOP-3062:
    --------------------------------

    Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch, 3062-3.patch, 3062-4.patch, 3062-5.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Robert Chansler (JIRA) at Oct 21, 2008 at 11:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Chansler updated HADOOP-3062:
    ------------------------------------

    Release Note: Introduced additional log records for data transfers.
    Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed, Incompatible change])
    Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
    ------------------------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-3062
    URL: https://issues.apache.org/jira/browse/HADOOP-3062
    Project: Hadoop Core
    Issue Type: Improvement
    Components: metrics
    Reporter: Runping Qi
    Assignee: Chris Douglas
    Fix For: 0.19.0

    Attachments: 3062-0.patch, 3062-1.patch, 3062-2.patch, 3062-3.patch, 3062-4.patch, 3062-5.patch


    In order to better understand the relationship between hadoop performance and the network bandwidth, we need to know
    what the aggregated traffic data in a cluster and its breakdown by racks. With these data, we can determine whether the network
    bandwidth is the bottleneck when certain jobs are running on a cluster.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 21, '08 at 5:33a
activeOct 21, '08 at 11:53p
posts23
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Robert Chansler (JIRA): 23 posts

People

Translate

site design / logo © 2022 Grokbase