Grokbase Groups Pig user July 2011
FAQ
I'm loading sequence files, of which each row's 'value' is a tab delimited set of columns. I'm
exploding the values out so that I can work with them separately, but pig's syntax parser is giving
me a hard time.

-----------------------------------------------------------------
logs = LOAD '/data/2011-07-17/part-*' USING SequenceFileLoader;
logs = FOREACH logs GENERATE
$0,
FLATTEN(STRSPLIT ($1, '\t'));

opens = FILTER logs BY $3 == 'open';
-----------------------------------------------------------------

gets me a syntax error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Out of bound access.
Trying to access non-existent column: 16. Schema {bytearray,bytearray} has 2 column(s).

which makes sense because if I do a :
grunt> describe logs;
logs: {bytearray,bytearray}

But... I KNOW that $3 exists because I have dumped that data during my debugging and the split /
flatten are working as expected... how do I tell pig that there are more columns?
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Search Discussions

  • Thejas Nair at Jul 19, 2011 at 6:03 am
    This has been fixed in pig 0.9 . Pig 0.9 should get released in few days.

    You can also build it from svn -
    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.9; cd
    branch-0.9; ant

    -Thejas

    On 7/18/11 12:31 PM, Jameson Lopp wrote:
    I'm loading sequence files, of which each row's 'value' is a tab
    delimited set of columns. I'm exploding the values out so that I can
    work with them separately, but pig's syntax parser is giving me a hard
    time.

    -----------------------------------------------------------------
    logs = LOAD '/data/2011-07-17/part-*' USING SequenceFileLoader;
    logs = FOREACH logs GENERATE
    $0,
    FLATTEN(STRSPLIT ($1, '\t'));

    opens = FILTER logs BY $3 == 'open';
    -----------------------------------------------------------------

    gets me a syntax error:
    ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
    parsing. Out of bound access. Trying to access non-existent column: 16.
    Schema {bytearray,bytearray} has 2 column(s).

    which makes sense because if I do a :
    grunt> describe logs;
    logs: {bytearray,bytearray}

    But... I KNOW that $3 exists because I have dumped that data during my
    debugging and the split / flatten are working as expected... how do I
    tell pig that there are more columns?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJul 18, '11 at 7:32p
activeJul 19, '11 at 6:03a
posts2
users2
websitepig.apache.org

2 users in discussion

Thejas Nair: 1 post Jameson Lopp: 1 post

People

Translate

site design / logo © 2021 Grokbase