FAQ
I'm having a problem getting the SequenceFileLoader, from the Piggybank, to
read sequence files whose values are block comressed (gzip'd). I'm using Pig
0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera.

Did the following:

* Copied the SequenceFileLoader class into my own project

* Removed

public LoadFunc.RequiredFieldResponse
fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList)

because LoadFunc.RequiredFieldList isn't resolvable, and added

public void fieldsToRead(Schema schema)

* Jarred up the .class file

* Programmatically created a trivial sequence file of a few lines, with
IntWritable keys and Text values, using the basic code in an example in
Hadoop The Definitive Guide

* That file is successfully read and keys/values displayed, with "hadoop fs
-text", as well as with pig, doing the following:

grunt> register sequencefileloader.jar;
grunt> r = load '/path/to/sequence_file' using
com.foobar.SequenceFileLoader();
grunt> dump r;

* The sequence file with the compressed values is successfully read with
hadoop fs -text

* When doing the load step in pig with that file, the following results:

--
2010-02-19 16:59:14,489 [main] WARN org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform..
. using builtin-java classes where applicable
2010-02-19 16:59:14,490 [main] INFO org.apache.hadoop.io.compress.CodecPool
- Got brand-new decompressor
2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1018: Problem determining schema during load
Details at logfile: /path/to/pig_1266616744562.log
--

That log file contains the following:

--
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during
parsing. Problem determining schema during load
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:363)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem
determining schema during load
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
... 8 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018:
Problem determining schema during load
at
org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
... 10 more
Caused by: java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
at
java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
at
java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:68)
at
org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.(GzipCodec.java:101)
at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFileLoader.java:140)
at
com.media6.SequenceFileLoader.determineSchema(SequenceFileLoader.java:106)
at
org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
... 11 more
--

Maybe there's something that needs to be added to SequenceFileLoader to
account for the compressed values, which hadoop's "fs -text" accounts for.
Thanks for any ideas/pointers.

Derek

Search Discussions

  • Dmitriy Ryaboy at Feb 19, 2010 at 10:51 pm
    Derek, please open a ticket on the Jira, I'll check it out. It's probably
    some trickiness with file bytes vs bytes read. I never tested with
    compressed input files.

    -D

    On Fri, Feb 19, 2010 at 2:45 PM, Derek Brown wrote:

    I'm having a problem getting the SequenceFileLoader, from the Piggybank, to
    read sequence files whose values are block comressed (gzip'd). I'm using
    Pig
    0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera.

    Did the following:

    * Copied the SequenceFileLoader class into my own project

    * Removed

    public LoadFunc.RequiredFieldResponse
    fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList)

    because LoadFunc.RequiredFieldList isn't resolvable, and added

    public void fieldsToRead(Schema schema)

    * Jarred up the .class file

    * Programmatically created a trivial sequence file of a few lines, with
    IntWritable keys and Text values, using the basic code in an example in
    Hadoop The Definitive Guide

    * That file is successfully read and keys/values displayed, with "hadoop fs
    -text", as well as with pig, doing the following:

    grunt> register sequencefileloader.jar;
    grunt> r = load '/path/to/sequence_file' using
    com.foobar.SequenceFileLoader();
    grunt> dump r;

    * The sequence file with the compressed values is successfully read with
    hadoop fs -text

    * When doing the load step in pig with that file, the following results:

    --
    2010-02-19 16:59:14,489 [main] WARN
    org.apache.hadoop.util.NativeCodeLoader
    - Unable to load native-hadoop library for your platform..
    . using builtin-java classes where applicable
    2010-02-19 16:59:14,490 [main] INFO
    org.apache.hadoop.io.compress.CodecPool
    - Got brand-new decompressor
    2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 1018: Problem determining schema during load
    Details at logfile: /path/to/pig_1266616744562.log
    --

    That log file contains the following:

    --
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
    during
    parsing. Problem determining schema during load
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
    at

    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
    at org.apache.pig.Main.main(Main.java:363)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem
    determining schema during load
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
    at

    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
    ... 8 more
    Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018:
    Problem determining schema during load
    at
    org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
    ... 10 more
    Caused by: java.io.EOFException
    at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
    at
    java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
    at
    java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
    at

    org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
    at

    org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
    at

    org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
    at

    org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
    at
    org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at
    com.media6.SequenceFileLoader.inferReader(SequenceFileLoader.java:140)
    at
    com.media6.SequenceFileLoader.determineSchema(SequenceFileLoader.java:106)
    at
    org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
    ... 11 more
    --

    Maybe there's something that needs to be added to SequenceFileLoader to
    account for the compressed values, which hadoop's "fs -text" accounts for.
    Thanks for any ideas/pointers.

    Derek
  • Derek Brown at Feb 22, 2010 at 3:09 pm
    Thanks, Dmitriy. I opened the issue on Friday:
    https://issues.apache.org/jira/browse/PIG-1246

    Derek

    From: Dmitriy Ryaboy <dvryaboy@gmail.com>
    To: pig-user@hadoop.apache.org
    Date: Fri, 19 Feb 2010 14:51:16 -0800
    Subject: Re: SequenceFileLoader problem with compressed values
    Derek, please open a ticket on the Jira, I'll check it out. It's probably
    some trickiness with file bytes vs bytes read. I never tested with
    compressed input files.

    -D

    On Fri, Feb 19, 2010 at 2:45 PM, Derek Brown wrote:

    I'm having a problem getting the SequenceFileLoader, from the Piggybank,
    to
    read sequence files whose values are block comressed (gzip'd). I'm using
    Pig
    0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera.

    Did the following:

    * Copied the SequenceFileLoader class into my own project

    * Removed

    public LoadFunc.RequiredFieldResponse
    fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList)

    because LoadFunc.RequiredFieldList isn't resolvable, and added

    public void fieldsToRead(Schema schema)

    * Jarred up the .class file

    * Programmatically created a trivial sequence file of a few lines, with
    IntWritable keys and Text values, using the basic code in an example in
    Hadoop The Definitive Guide

    * That file is successfully read and keys/values displayed, with "hadoop
    fs
    -text", as well as with pig, doing the following:

    grunt> register sequencefileloader.jar;
    grunt> r = load '/path/to/sequence_file' using
    com.foobar.SequenceFileLoader();
    grunt> dump r;

    * The sequence file with the compressed values is successfully read with
    hadoop fs -text

    * When doing the load step in pig with that file, the following results:

    --
    2010-02-19 16:59:14,489 [main] WARN
    org.apache.hadoop.util.NativeCodeLoader
    - Unable to load native-hadoop library for your platform..
    . using builtin-java classes where applicable
    2010-02-19 16:59:14,490 [main] INFO
    org.apache.hadoop.io.compress.CodecPool
    - Got brand-new decompressor
    2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 1018: Problem determining schema during load
    Details at logfile: /path/to/pig_1266616744562.log
    --

    That log file contains the following:

    --
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
    during
    parsing. Problem determining schema during load
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
    at
    org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
    at org.apache.pig.Main.main(Main.java:363)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Problem
    determining schema during load
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
    ... 8 more
    Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
    1018:
    Problem determining schema during load
    at
    org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
    ... 10 more
    Caused by: java.io.EOFException
    at
    java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
    at
    java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
    at
    java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
    at
    org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
    at
    org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
    at
    org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
    at
    org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
    at
    org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
    org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at
    com.media6.SequenceFileLoader.inferReader(SequenceFileLoader.java:140)
    at
    com.media6.SequenceFileLoader.determineSchema(SequenceFileLoader.java:106)
    at
    org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
    ... 11 more
    --

    Maybe there's something that needs to be added to SequenceFileLoader to
    account for the compressed values, which hadoop's "fs -text" accounts
    for.
    Thanks for any ideas/pointers.

    Derek

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 19, '10 at 10:45p
activeFeb 22, '10 at 3:09p
posts3
users2
websitepig.apache.org

2 users in discussion

Derek Brown: 2 posts Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2021 Grokbase