FAQ
SequenceFile.Reader can't read gzip format compressed sequence file which produce by a mapreduce job without native compression library
---------------------------------------------------------------------------------------------------------------------------------------

Key: HADOOP-6817
URL: https://issues.apache.org/jira/browse/HADOOP-6817
Project: Hadoop Common
Issue Type: Bug
Components: io
Affects Versions: 0.20.2
Environment: Cluster:CentOS 5,jdk1.6.0_20
Client:Mac SnowLeopard,jdk1.6.0_20
Reporter: Wenjun Huang


An hadoop job output a gzip compressed sequence file(whether record compressed or block compressed).The client program use SequenceFile.Reader to read this sequence file,when reading the client program shows the following exceptions:

2090 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2091 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
Exception in thread "main" java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:68)
at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.(GzipCodec.java:101)
at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170)
at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:180)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
at org.apache.hadoop.io.SequenceFile$Reader.(HtmlContentSeqOutputView.java:28)

I studied the code in org.apache.hadoop.io.SequenceFile.Reader.init method and read:
// Initialize... *not* if this we are constructing a temporary Reader
if (!tempReader) {
valBuffer = new DataInputBuffer();
if (decompress) {
valDecompressor = CodecPool.getDecompressor(codec);
valInFilter = codec.createInputStream(valBuffer, valDecompressor);
valIn = new DataInputStream(valInFilter);
} else {
valIn = valBuffer;
}
the problem seems to be caused by "valBuffer = new DataInputBuffer();" ,because GzipCodec.createInputStream creates an instance of GzipInputStream whose constructor creates an instance of ResetableGZIPInputStream class.When ResetableGZIPInputStream's constructor calls it base class java.util.zip.GZIPInputStream's constructor ,it trys to read the empty "valBuffer = new DataInputBuffer();" and get no content,so it throws an EOFException.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 10, '10 at 7:19a
activeJun 10, '10 at 7:19a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Wenjun Huang (JIRA): 1 post

People

Translate

site design / logo © 2023 Grokbase