FAQ
Handle large (several MB) text input lines in a reasonable amount of time
-------------------------------------------------------------------------

Key: HADOOP-6109
URL: https://issues.apache.org/jira/browse/HADOOP-6109
Project: Hadoop Common
Issue Type: Improvement
Components: util
Affects Versions: 0.19.0
Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
Reporter: thushara wijeratna


problem:
=======
hadoop was timing out on a simple pass-through job (with the default 10 min timeout)

cause:
=====
i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
i attach the patch (for trunk)

the problem traces:
================

hadoop version: 0.19.0
userlogs on slave node:

2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
[root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#

tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)

output on map-reduce task:

Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • thushara wijeratna (JIRA) at Jun 25, 2009 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    thushara wijeratna updated HADOOP-6109:
    ---------------------------------------

    Attachment: HADOOP-1234.patch
    HADOOP-1234.patch
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • thushara wijeratna (JIRA) at Jun 25, 2009 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    thushara wijeratna updated HADOOP-6109:
    ---------------------------------------

    Attachment: (was: HADOOP-1234.patch)
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • thushara wijeratna (JIRA) at Jun 25, 2009 at 11:42 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724340#action_12724340 ]

    thushara wijeratna commented on HADOOP-6109:
    --------------------------------------------

    debugging steps taken:
    ==================

    i included LOG statements to get an idea of the latencies thus - inside LineReader.readLine():

    if (length >= 0) {
    str.append(buffer, startPosn, length);
    LOG.info("str.length= " + str.getLength() + " just wrote from " + startPosn + " to " + length + " bytes");
    }


    and this is the kind of output i get:
    at the beginning:
    2009-06-03 09:35:56,863 INFO org.apache.hadoop.util.LineReader: str.length= 4096 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,864 INFO org.apache.hadoop.util.LineReader: str.length= 8192 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,865 INFO org.apache.hadoop.util.LineReader: str.length= 12288 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,866 INFO org.apache.hadoop.util.LineReader: str.length= 16384 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,866 INFO org.apache.hadoop.util.LineReader: str.length= 20480 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,867 INFO org.apache.hadoop.util.LineReader: str.length= 24576 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,867 INFO org.apache.hadoop.util.LineReader: str.length= 28672 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,868 INFO org.apache.hadoop.util.LineReader: str.length= 32768 just wrote from 0 to 4096 bytes
    2009-06-03 09:35:56,869 INFO org.apache.hadoop.util.LineReader: str.length= 36864 just wrote from 0 to 4096 bytes



    in the end:
    2009-06-03 09:46:02,918 INFO org.apache.hadoop.util.LineReader: str.length= 60141568 just wrote from 0 to 4096 bytes
    2009-06-03 09:46:03,048 INFO org.apache.hadoop.util.LineReader: str.length= 60145664 just wrote from 0 to 4096 bytes
    2009-06-03 09:46:03,118 INFO org.apache.hadoop.util.LineReader: str.length= 60149760 just wrote from 0 to 4096 bytes
    2009-06-03 09:46:03,183 INFO org.apache.hadoop.util.LineReader: str.length= 60153856 just wrote from 0 to 4096 bytes
    2009-06-03 09:46:03,252 INFO org.apache.hadoop.util.LineReader: str.length= 60157952 just wrote from 0 to 4096 bytes
    2009-06-03 09:46:03,317 INFO org.apache.hadoop.util.LineReader: str.length= 60162048 just wrote from 0 to 4096 bytes
    2009-06-03 09:46:03,456 INFO org.apache.hadoop.util.LineReader: str.length= 60166144 just wrote from 0 to 4096 bytes

    notice the times are degrading in the end - this is the pattern, there is about 1 millisecond between the 1st consecutive reads, and in the end there is more than 50ms between 2 consecutive reads.
    Text.append has a potential perf issue.

    as you can see, about 60M of the input line is being read in 10 minutes.



    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • thushara wijeratna (JIRA) at Jun 25, 2009 at 11:44 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724342#action_12724342 ]

    thushara wijeratna commented on HADOOP-6109:
    --------------------------------------------

    repro steps:
    =========
    create a text flle that includes a 100MB line and run an identity map/reduce job
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • thushara wijeratna (JIRA) at Jun 25, 2009 at 11:46 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724343#action_12724343 ]

    thushara wijeratna commented on HADOOP-6109:
    --------------------------------------------

    some more perf data (using yourkit) on the basic org.apache.hadoop.io.Text class is available at http://thushw.blogspot.com/2009/06/hadoop-reading-large-lines-several-mb.html
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jun 26, 2009 at 8:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724695#action_12724695 ]

    Chris Douglas commented on HADOOP-6109:
    ---------------------------------------

    You might want to try the same experiment with a larger io.file.buffer.size, say 1 or 2MB. Though the growth remains linear, at least it grows by more than 4k per read.

    Rather than growing a separate buffer and copying that into Text, replacing the current code in Text::setCapacity with {{bytes = Arrays.copyOf(bytes, Math.max(len, length << 1))}} should improve Text's performance. I don't think there's any reason why Text needs to be exactly the length of the largest value it's held.
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • thushara wijeratna (JIRA) at Jun 26, 2009 at 10:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724741#action_12724741 ]

    thushara wijeratna commented on HADOOP-6109:
    --------------------------------------------

    correct Chris - the speculated over-allocation is the only significant difference between ByteArrayOutputStream and Text class, with regard to increasing capacity.
    i have tested this and can confirm the perf improvements. after running the hadoop tests i will attach the patch.
    thanks,
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • thushara wijeratna (JIRA) at Jun 26, 2009 at 10:34 pm
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    thushara wijeratna updated HADOOP-6109:
    ---------------------------------------

    Attachment: HADOOP-1234.patch
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch, HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jun 28, 2009 at 9:19 am
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-6109:
    ----------------------------------

    Component/s: (was: util)
    io
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: io
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Attachments: HADOOP-1234.patch, HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jun 28, 2009 at 9:19 am
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-6109:
    ----------------------------------

    Assignee: thushara wijeratna
    Status: Patch Available (was: Open)
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: io
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Assignee: thushara wijeratna
    Attachments: HADOOP-1234.patch, HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jun 29, 2009 at 1:47 am
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725017#action_12725017 ]

    Chris Douglas commented on HADOOP-6109:
    ---------------------------------------

    {noformat}
    [exec] -1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
    [exec] Please justify why no new tests are needed for this patch.
    [exec] Also please list what manual steps were performed to verify this patch.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] -1 javac. The applied patch generated 64 javac compiler warnings (more than the trunk's current 124 warnings).
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
    {noformat}

    The javac objection is spurious. The patch adds no warnings.
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: io
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Assignee: thushara wijeratna
    Attachments: HADOOP-1234.patch, HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jun 29, 2009 at 7:19 am
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-6109:
    ----------------------------------

    Resolution: Fixed
    Fix Version/s: 0.21.0
    Hadoop Flags: [Reviewed]
    Status: Resolved (was: Patch Available)

    All unit tests passed both in common and hdfs. It does not require a unit test, as this only changes the rate at which an internal buffer expands.

    I committed this. Thanks, Thushara!
    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: io
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Assignee: thushara wijeratna
    Fix For: 0.21.0

    Attachments: HADOOP-1234.patch, HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Jun 29, 2009 at 11:11 am
    [ https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725122#action_12725122 ]

    Hudson commented on HADOOP-6109:
    --------------------------------

    Integrated in Hadoop-Common-trunk #11 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/11/])
    . Change Text to grow its internal buffer exponentially, rather
    than the max of the current length and the proposed length to improve
    performance reading large values. Contributed by thushara wijeratna

    Handle large (several MB) text input lines in a reasonable amount of time
    -------------------------------------------------------------------------

    Key: HADOOP-6109
    URL: https://issues.apache.org/jira/browse/HADOOP-6109
    Project: Hadoop Common
    Issue Type: Improvement
    Components: io
    Affects Versions: 0.19.0
    Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz with 1M L1/L2 cache 1.8G RAM
    Reporter: thushara wijeratna
    Assignee: thushara wijeratna
    Fix For: 0.21.0

    Attachments: HADOOP-1234.patch, HADOOP-1234.patch


    problem:
    =======
    hadoop was timing out on a simple pass-through job (with the default 10 min timeout)
    cause:
    =====
    i hunted this down to how Text lines are being processed inside org.apache.hadoop.util.LineReader.
    i have a fix, a task that took more than 20 minutes and still failed to complete, completes with this fix in under 30 s.
    i attach the patch (for trunk)
    the problem traces:
    ================
    hadoop version: 0.19.0
    userlogs on slave node:
    2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting attempt_200905281652_0013_m_000006_1
    [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
    tellingly, the last input line processed right before this WARN is 19K. (i log the full input line in the map function for debugging)
    output on map-reduce task:
    Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 seconds. Killing!
    09/05/29 14:08:01 INFO mapred.JobClient: map 99% reduce 32%
    09/05/29 14:18:05 INFO mapred.JobClient: map 98% reduce 32%
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 25, '09 at 11:32p
activeJun 29, '09 at 11:11a
posts14
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 14 posts

People

Translate

site design / logo © 2022 Grokbase