FAQ
Hi,

I am using hadoop-0.20.1 . When I have a bzip2 file of larger size than the
configured block -size. Only single mapper gets launched. clearly says that
its not using split-able feature of bzip2.

But when I use my own InputFormat say SafeInputFormat( extends
FileInputFormat ) and allow isSplitable true. It executes multiple mappers,
but fails when reducers reaches 33% for the large size(of order of 2 GB) of
bzip2 files.

Above works well with smaller bzip2 file size(of order of 500MB ).

Kindly help me to get the workable solution here.

Thanks & regards,
- Deepak Diwakar,

Search Discussions

  • Harsh J at Apr 19, 2011 at 7:08 pm
    Hello Deepak,
    On Tue, Apr 19, 2011 at 9:33 PM, Deepak Diwakar wrote:
    Hi,

    I am using hadoop-0.20.1
    But when I use my own InputFormat say SafeInputFormat( extends
    FileInputFormat ) and allow isSplitable true. It executes multiple mappers,
    but fails when reducers reaches 33% for the large size(of order of 2 GB) of
    bzip2 files.
    BZip2 splitting support was added to Apache Hadoop 0.21.0 release, and
    isn't available in the Apache Hadoop 0.20.x. Was the 0.20.1 version a
    typo?
    Also, what reason/trace does the reducer throw up when it fails?

    --
    Harsh J
  • Deepak Diwakar at Apr 20, 2011 at 5:48 am
    Hi Harsh,

    Thanks for the input.

    Yeah when I went through Hadoop-0.20.1 and Hadoop-0.21.0 code , got the same
    impression . But Since there are lots of changes in 0.21 and hence thought
    to still use 0.20.1. But to use Split-able feature of bzip2 tried changing
    FileInPutFormat by extending that but appears it was working fine for 500MB
    size of Bzip2 files but not for ~2GB size of bzip2 files where block size is
    64MB. I think there are few more dependencies which I have not modified.

    When It was failing - actually it doesn't say task is failed , instead
    reducer kept trying running again and again. And during retry it fails
    actually in suffle phase, msg pasted below: -

    INFO org.apache.hadoop.mapred.ReduceTask: Failed to shuffle from
    attempt_201104102321_0019_m_000022_0
    java.io.IOException: Premature EOF
    at
    sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538)
    at
    sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582)
    at
    sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2446)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleToDisk(ReduceTask.java:1624)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1416)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at
    org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
    *Do any body have some patch to append in Hadoop-0.20.1 to support bzip2
    splitable? Would be really helpful.
    *

    Thanks & regards,
    - Deepak Diwakar,



    On 20 April 2011 00:37, Harsh J wrote:

    Hello Deepak,
    On Tue, Apr 19, 2011 at 9:33 PM, Deepak Diwakar wrote:
    Hi,

    I am using hadoop-0.20.1
    But when I use my own InputFormat say SafeInputFormat( extends
    FileInputFormat ) and allow isSplitable true. It executes multiple mappers,
    but fails when reducers reaches 33% for the large size(of order of 2 GB) of
    bzip2 files.
    BZip2 splitting support was added to Apache Hadoop 0.21.0 release, and
    isn't available in the Apache Hadoop 0.20.x. Was the 0.20.1 version a
    typo?
    Also, what reason/trace does the reducer throw up when it fails?

    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 19, '11 at 4:03p
activeApr 20, '11 at 5:48a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Deepak Diwakar: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase