Thanks for the input.
Yeah when I went through Hadoop-0.20.1 and Hadoop-0.21.0 code , got the same
impression . But Since there are lots of changes in 0.21 and hence thought
to still use 0.20.1. But to use Split-able feature of bzip2 tried changing
FileInPutFormat by extending that but appears it was working fine for 500MB
size of Bzip2 files but not for ~2GB size of bzip2 files where block size is
64MB. I think there are few more dependencies which I have not modified.
When It was failing - actually it doesn't say task is failed , instead
reducer kept trying running again and again. And during retry it fails
actually in suffle phase, msg pasted below: -
INFO org.apache.hadoop.mapred.ReduceTask: Failed to shuffle from
java.io.IOException: Premature EOF
*Do any body have some patch to append in Hadoop-0.20.1 to support bzip2
splitable? Would be really helpful.
Thanks & regards,
- Deepak Diwakar,
On 20 April 2011 00:37, Harsh J wrote:
On Tue, Apr 19, 2011 at 9:33 PM, Deepak Diwakar wrote:
I am using hadoop-0.20.1
But when I use my own InputFormat say SafeInputFormat( extends
FileInputFormat ) and allow isSplitable true. It executes multiple mappers,
but fails when reducers reaches 33% for the large size(of order of 2 GB) of
BZip2 splitting support was added to Apache Hadoop 0.21.0 release, and
isn't available in the Apache Hadoop 0.20.x. Was the 0.20.1 version a
Also, what reason/trace does the reducer throw up when it fails?