FAQ
Hi,
we're running 100 XLarge instances (ec2), with a gig of heap space for
each task - and are seeing the following error frequently (but not
always):
##### BEGIN PASTE #####
[exec] 08/09/03 11:21:09 INFO mapred.JobClient: map 43% reduce 5%
[exec] 08/09/03 11:21:16 INFO mapred.JobClient: Task Id :
attempt_200809031101_0001_m_000220_0, Status : FAILED
[exec] java.io.IOException: Spill failed
[exec] at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.flush(MapTask.java:688)
[exec] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
[exec] at org.apache.hadoop.mapred.TaskTracker
$Child.main(TaskTracker.java:2209)
[exec] Caused by: java.lang.OutOfMemoryError: Java heap space
[exec] at org.apache.hadoop.mapred.MapTask$MapOutputBuffer
$InMemValBytes.reset(MapTask.java:928)
[exec] at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.getVBytesForOffset(MapTask.java:891)
[exec] at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.sortAndSpill(MapTask.java:765)
[exec] at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.access$1600(MapTask.java:286)
[exec] at org.apache.hadoop.mapred.MapTask$MapOutputBuffer
$SpillThread.run(MapTask.java:712)
##### END #####

Has anyone seen this? Thanks,

Florian Leibert
Sr. Software Engineer
Adknowledge Inc.

Search Discussions

  • Paco NATHAN at Sep 3, 2008 at 4:50 pm
    Also, that almost always happens early in the map phase of the first
    MR job which runs on our cluster.

    Hadoop 0.18.1 on EC2 m1.xl instances.

    We run 10 MR jobs in sequence, 6hr duration, not seeing the problem
    repeated after that 1 heap space exception.

    Paco

    On Wed, Sep 3, 2008 at 11:42 AM, Florian Leibert wrote:
    Hi,
    we're running 100 XLarge instances (ec2), with a gig of heap space for each
    task - and are seeing the following error frequently (but not always):
    ##### BEGIN PASTE #####
    [exec] 08/09/03 11:21:09 INFO mapred.JobClient: map 43% reduce 5%
    [exec] 08/09/03 11:21:16 INFO mapred.JobClient: Task Id :
    attempt_200809031101_0001_m_000220_0, Status : FAILED
    [exec] java.io.IOException: Spill failed
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:688)
    [exec] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
    [exec] at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    [exec] Caused by: java.lang.OutOfMemoryError: Java heap space
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer$InMemValBytes.reset(MapTask.java:928)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getVBytesForOffset(MapTask.java:891)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:765)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.java:286)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:712)
    ##### END #####

    Has anyone seen this? Thanks,

    Florian Leibert
    Sr. Software Engineer
    Adknowledge Inc.
  • Chris Douglas at Sep 3, 2008 at 7:51 pm
    InMemValBytes::reset can perform an allocation, but it should be only
    as large as the value. When you look at the log for the failed task,
    what does it report as the values of bufstart, bufend, kvstart, kvend,
    etc. before the spill? -C
    On Sep 3, 2008, at 9:49 AM, Paco NATHAN wrote:

    Also, that almost always happens early in the map phase of the first
    MR job which runs on our cluster.

    Hadoop 0.18.1 on EC2 m1.xl instances.

    We run 10 MR jobs in sequence, 6hr duration, not seeing the problem
    repeated after that 1 heap space exception.

    Paco

    On Wed, Sep 3, 2008 at 11:42 AM, Florian Leibert wrote:
    Hi,
    we're running 100 XLarge instances (ec2), with a gig of heap space
    for each
    task - and are seeing the following error frequently (but not
    always):
    ##### BEGIN PASTE #####
    [exec] 08/09/03 11:21:09 INFO mapred.JobClient: map 43% reduce 5%
    [exec] 08/09/03 11:21:16 INFO mapred.JobClient: Task Id :
    attempt_200809031101_0001_m_000220_0, Status : FAILED
    [exec] java.io.IOException: Spill failed
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:
    688)
    [exec] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:
    228)
    [exec] at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2209)
    [exec] Caused by: java.lang.OutOfMemoryError: Java heap space
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    $InMemValBytes.reset(MapTask.java:928)
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.getVBytesForOffset(MapTask.java:891)
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.sortAndSpill(MapTask.java:765)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access
    $1600(MapTask.java:286)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    $SpillThread.run(MapTask.java:712)
    ##### END #####

    Has anyone seen this? Thanks,

    Florian Leibert
    Sr. Software Engineer
    Adknowledge Inc.
  • Florian Leibert at Sep 10, 2008 at 8:51 pm
    Hi Chris,
    where do you find those values? I don't seem to see them in the
    userlogs nor in the tasktracker logs...

    Thanks!

    Florian
    On Sep 3, 2008, at 2:50 PM, Chris Douglas wrote:

    InMemValBytes::reset can perform an allocation, but it should be
    only as large as the value. When you look at the log for the failed
    task, what does it report as the values of bufstart, bufend,
    kvstart, kvend, etc. before the spill? -C
    On Sep 3, 2008, at 9:49 AM, Paco NATHAN wrote:

    Also, that almost always happens early in the map phase of the first
    MR job which runs on our cluster.

    Hadoop 0.18.1 on EC2 m1.xl instances.

    We run 10 MR jobs in sequence, 6hr duration, not seeing the problem
    repeated after that 1 heap space exception.

    Paco


    On Wed, Sep 3, 2008 at 11:42 AM, Florian Leibert <[email protected]>
    wrote:
    Hi,
    we're running 100 XLarge instances (ec2), with a gig of heap space
    for each
    task - and are seeing the following error frequently (but not
    always):
    ##### BEGIN PASTE #####
    [exec] 08/09/03 11:21:09 INFO mapred.JobClient: map 43% reduce 5%
    [exec] 08/09/03 11:21:16 INFO mapred.JobClient: Task Id :
    attempt_200809031101_0001_m_000220_0, Status : FAILED
    [exec] java.io.IOException: Spill failed
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.flush(MapTask.java:688)
    [exec] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:
    228)
    [exec] at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2209)
    [exec] Caused by: java.lang.OutOfMemoryError: Java heap space
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    $InMemValBytes.reset(MapTask.java:928)
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.getVBytesForOffset(MapTask.java:891)
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.sortAndSpill(MapTask.java:765)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access
    $1600(MapTask.java:286)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    $SpillThread.run(MapTask.java:712)
    ##### END #####

    Has anyone seen this? Thanks,

    Florian Leibert
    Sr. Software Engineer
    Adknowledge Inc.
  • Chris Douglas at Sep 10, 2008 at 10:44 pm
    It should be in the task logs for the maps. You're not seeing
    "bufstart", "bufvoid", etc.? The logging is at INFO level, so if
    you've set your loglevel higher that, you won't see these messages.
    Are you seeing any logging from MapTask? How large are your serialized
    values? -C
    On Sep 10, 2008, at 1:50 PM, Florian Leibert wrote:

    Hi Chris,
    where do you find those values? I don't seem to see them in the
    userlogs nor in the tasktracker logs...

    Thanks!

    Florian
    On Sep 3, 2008, at 2:50 PM, Chris Douglas wrote:

    InMemValBytes::reset can perform an allocation, but it should be
    only as large as the value. When you look at the log for the failed
    task, what does it report as the values of bufstart, bufend,
    kvstart, kvend, etc. before the spill? -C
    On Sep 3, 2008, at 9:49 AM, Paco NATHAN wrote:

    Also, that almost always happens early in the map phase of the first
    MR job which runs on our cluster.

    Hadoop 0.18.1 on EC2 m1.xl instances.

    We run 10 MR jobs in sequence, 6hr duration, not seeing the problem
    repeated after that 1 heap space exception.

    Paco


    On Wed, Sep 3, 2008 at 11:42 AM, Florian Leibert <[email protected]>
    wrote:
    Hi,
    we're running 100 XLarge instances (ec2), with a gig of heap
    space for each
    task - and are seeing the following error frequently (but not
    always):
    ##### BEGIN PASTE #####
    [exec] 08/09/03 11:21:09 INFO mapred.JobClient: map 43% reduce 5%
    [exec] 08/09/03 11:21:16 INFO mapred.JobClient: Task Id :
    attempt_200809031101_0001_m_000220_0, Status : FAILED
    [exec] java.io.IOException: Spill failed
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.flush(MapTask.java:688)
    [exec] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:
    228)
    [exec] at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2209)
    [exec] Caused by: java.lang.OutOfMemoryError: Java heap space
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    $InMemValBytes.reset(MapTask.java:928)
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.getVBytesForOffset(MapTask.java:891)
    [exec] at
    org.apache.hadoop.mapred.MapTask
    $MapOutputBuffer.sortAndSpill(MapTask.java:765)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access
    $1600(MapTask.java:286)
    [exec] at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    $SpillThread.run(MapTask.java:712)
    ##### END #####

    Has anyone seen this? Thanks,

    Florian Leibert
    Sr. Software Engineer
    Adknowledge Inc.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 3, '08 at 4:44p
activeSep 10, '08 at 10:44p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase