FAQ
Hi!

I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and
after the upgrade I'm getting the following error in the reducers
during the copy phase in one of my larger jobs:

2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask:
Initiating in-memory merge with 12 segments... 2010-10-20 17:43:22,344
INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments
2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 12 segments left of total size: 382660295
bytes 2010-10-20 17:43:22,517 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_201010201640_0001_r_000000_0 Merging of the local FS files
threw an exception: java.io.IOException: java.lang.RuntimeException:
java.io.EOFException at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at
org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
at
org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
at
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at
org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2529)
Caused by: java.io.EOFException at
java.io.DataInputStream.readFully(DataInputStream.java:180) at
org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:154)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:125) ...
7 more

at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2533)

2010-10-20 17:43:22,765 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_201010201640_0001_r_000000_0 Merge of the inmemory files threw
an exception: java.io.IOException: Intermediate merge failed at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2651)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2576)
Caused by: java.lang.RuntimeException: java.io.EOFException at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at
org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
at
org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
at
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at
org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2635) ...
1 more Caused by: java.io.EOFException at
java.io.DataInputStream.readFully(DataInputStream.java:180) at
org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:154)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) ...
8 more

2010-10-20 17:43:23,766 INFO org.apache.hadoop.mapred.ReduceTask:
GetMapEventsThread exiting 2010-10-20 17:43:23,767 INFO
org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined.
2010-10-20 17:43:23,774 INFO org.apache.hadoop.mapred.ReduceTask:
Closed ram manager 2010-10-20 17:43:23,931 WARN
org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.IOException: Task: attempt_201010201640_0001_r_000000_0 - The
reduce copier failed at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) at
org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by:
java.io.IOException: Intermediate merge failed at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2651)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2576)
Caused by: java.lang.RuntimeException: java.io.EOFException at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at
org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
at
org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
at
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at
org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2635) ...
1 more Caused by: java.io.EOFException at
java.io.DataInputStream.readFully(DataInputStream.java:180) at
org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:154)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) ...
8 more 2010-10-20 17:43:23,936 INFO
org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task

I've tried:

*) Running with and without compressed map output, no difference.
*) With -Xmx512m and -Xmx768m, no difference.
*) Decreasing number of mappers and reducers on all nodes to decrease
overall load.
*) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default)

Any ideas on what to try next? What other information is required to
find out what's wrong?

Regards,
\EF
--
Erik Forsberg <forsberg@opera.com>
Developer, Opera Software - http://www.opera.com/

Search Discussions

  • Erik Forsberg at Oct 21, 2010 at 10:44 am

    On Wed, 20 Oct 2010 19:49:09 +0200 Erik Forsberg wrote:

    Hi!

    I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and
    after the upgrade I'm getting the following error in the reducers
    during the copy phase in one of my larger jobs:

    2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask:
    Initiating in-memory merge with 12 segments... 2010-10-20 17:43:22,344
    INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments
    2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to
    the last merge-pass, with 12 segments left of total size: 382660295
    bytes 2010-10-20 17:43:22,517 WARN
    org.apache.hadoop.mapred.ReduceTask:
    attempt_201010201640_0001_r_000000_0 Merging of the local FS files
    threw an exception: java.io.IOException: java.lang.RuntimeException:
    java.io.EOFException at
    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
    What does a EOFException in this code actually mean? Is it hiding some
    other error that could tell me more about what's wrong?

    I'm seeing quite a few of these in my datanode logs:

    2010-10-21 10:21:01,149 WARN
    org.apache.hadoop.hdfs.server.datanode.DataNode:
    DatanodeRegistration(10.20.11.66:50010,
    storageID=DS-71308762-10.20.11.66-50010-1269957604444, infoPort= 50075,
    ipcPort=50020):Got exception while serving
    blk_1081044479123523815_4852013 to /10.20.11.88:
    java.net.SocketTimeoutException: 480000 millis timeout while waiting
    for channel to be ready for write. ch :
    java.nio.channels.SocketChannel[connected local=/10.20.11.66:50010
    remote =/10.20.11.88:41347] at
    org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
    at
    org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
    at
    org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
    at
    org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
    at
    org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401)
    at
    org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
    at
    org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
    at java.lang.Thread.run(Thread.java:619)

    Could that be related somehow?

    I'm also seeing large amounts of mortbay exceptions, but MAPREDUCE-5 says they are harmless.
    *) Running with and without compressed map output, no difference.
    *) With -Xmx512m and -Xmx768m, no difference.
    *) Decreasing number of mappers and reducers on all nodes to decrease
    overall load.
    *) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default)
    Also tried doubling the number of reducers to get each reducer to
    process less data, but that didn't help either :-(

    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Erik Forsberg at Oct 21, 2010 at 10:49 am

    On Thu, 21 Oct 2010 12:44:24 +0200 Erik Forsberg wrote:
    attempt_201010201640_0001_r_000000_0 Merging of the local FS files
    threw an exception: java.io.IOException: java.lang.RuntimeException:
    java.io.EOFException at
    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
    In addition, some reducers first fail with:

    010-10-21 10:31:24,696 INFO org.apache.hadoop.mapred.ReduceTask:
    header: attempt_201010201640_0186_m_000579_0, compressed len: 2281015,
    decompressed len: 10368075 2010-10-21 10:31:24,696 INFO
    org.apache.hadoop.mapred.ReduceTask: Shuffling 10368075 bytes (2281015
    raw bytes) into RAM from attempt_201010201640_0186_m_000579_0
    2010-10-21 10:31:24,744 INFO org.apache.hadoop.io.compress.CodecPool:
    Got brand-new decompressor 2010-10-21 10:31:24,854 INFO
    org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
    segments left of total size: 582879560 bytes 2010-10-21 10:31:27,396
    FATAL org.apache.hadoop.mapred.TaskRunner:
    attempt_201010201640_0186_r_000027_2 : Failed to merge in
    memoryjava.lang.OutOfMemoryError: Java heap space at
    org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:119)
    at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:98) at
    org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
    at
    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at
    org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
    at
    org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at
    org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2635)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2576)

    2010-10-21 10:31:27,397 WARN org.apache.hadoop.mapred.ReduceTask:
    attempt_201010201640_0186_r_000027_2 Merging of the local FS files
    threw an exception: java.io.IOException: java.lang.RuntimeException:
    java.io.EOFException at
    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at
    org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
    at
    org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at
    org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at
    org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2529)
    Caused by: java.io.EOFException at
    java.io.DataInputStream.readFully(DataInputStream.java:180) at
    org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:154)
    at
    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) ...
    7 more

    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2533)

    2010-10-21 10:31:27,714 INFO org.apache.hadoop.mapred.ReduceTask:
    GetMapEventsThread exiting 2010-10-21 10:31:27,717 INFO
    org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined.
    2010-10-21 10:31:27,727 INFO org.apache.hadoop.mapred.ReduceTask:
    Closed ram manager

    Then, on second try, they fail with the java.io.EOFException above.

    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Avain at Oct 21, 2010 at 3:19 pm
    533333333444444(((((((((5(56 一

    �奈业� iPod �魉统�
    �M
    在 2010/10/21 下午6:44 �r,ErikForsberg <forsberg@opera.com> 到:

    On Wed, 20 Oct 2010 19:49:09 +0200
    Erik Forsberg wrote:
    Hi!

    I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and
    after the upgrade I'm getting the following error in the reducers
    during the copy phase in one of my larger jobs:

    2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask:
    Initiating in-memory merge with 12 segments... 2010-10-20
    17:43:22,344
    INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments
    2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to
    the last merge-pass, with 12 segments left of total size: 382660295
    bytes 2010-10-20 17:43:22,517 WARN
    org.apache.hadoop.mapred.ReduceTask:
    attempt_201010201640_0001_r_000000_0 Merging of the local FS files
    threw an exception: java.io.IOException: java.lang.RuntimeException:
    java.io.EOFException at
    org.apache.hadoop.io.WritableComparator.compare
    (WritableComparator.java:128)
    What does a EOFException in this code actually mean? Is it hiding some
    other error that could tell me more about what's wrong?

    I'm seeing quite a few of these in my datanode logs:

    2010-10-21 10:21:01,149 WARN
    org.apache.hadoop.hdfs.server.datanode.DataNode:
    DatanodeRegistration(10.20.11.66:50010,
    storageID=DS-71308762-10.20.11.66-50010-1269957604444, infoPort=
    50075,
    ipcPort=50020):Got exception while serving
    blk_1081044479123523815_4852013 to /10.20.11.88:
    java.net.SocketTimeoutException: 480000 millis timeout while waiting
    for channel to be ready for write. ch :
    java.nio.channels.SocketChannel[connected local=/10.20.11.66:50010
    remote =/10.20.11.88:41347] at
    org.apache.hadoop.net.SocketIOWithTimeout.waitForIO
    (SocketIOWithTimeout.java:246)
    at
    org.apache.hadoop.net.SocketOutputStream.waitForWritable
    (SocketOutputStream.java:159)
    at
    org.apache.hadoop.net.SocketOutputStream.transferToFully
    (SocketOutputStream.java:198)
    at
    org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks
    (BlockSender.java:313)
    at
    org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock
    (BlockSender.java:401)
    at
    org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock
    (DataXceiver.java:180)
    at
    org.apache.hadoop.hdfs.server.datanode.DataXceiver.run
    (DataXceiver.java:95)
    at java.lang.Thread.run(Thread.java:619)

    Could that be related somehow?

    I'm also seeing large amounts of mortbay exceptions, but MAPREDUCE-5
    says they are harmless.
    *) Running with and without compressed map output, no difference.
    *) With -Xmx512m and -Xmx768m, no difference.
    *) Decreasing number of mappers and reducers on all nodes to decrease
    overall load.
    *) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default)
    Also tried doubling the number of reducers to get each reducer to
    process less data, but that didn't help either :-(

    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 20, '10 at 5:49p
activeOct 21, '10 at 3:19p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Erik Forsberg: 3 posts Avain: 1 post

People

Translate

site design / logo © 2022 Grokbase