FAQ
Hi,

I've recently gotten stumped by a problem where my attempts to dump the
relations produced by a GROUP command give the following error (though
illustrating the same relation works fine):

java.io.IOException: Type mismatch in key from map: expected
org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
.
.
.

for a little background, the relation that's failing is called y5, and
is produced by the following string of commands (in grunt):

y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as argMap;
y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
y4 = FILTER y3 BY (uid is not null);
y5 = GROUP y4 BY uid;

and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:

-----------------------------------------------------------------------------------------------------
y1 | timestamp: int | args: bag({tuple_of_tokens: (token: chararray)}) |
-----------------------------------------------------------------------------------------------------
1265950806 | {(s=1381688313), (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
y2 | timestamp: int | argMap: map |
-----------------------------------------------------------------------------------------------
1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E, ts=1265950805, s=1381688313} |
-----------------------------------------------------------------------------------------------
--------------------------------------------
y3 | uid: bytearray | timestamp: int |
--------------------------------------------
1381688313 | 1265950806 |
--------------------------------------------
--------------------------------------------
y4 | uid: bytearray | timestamp: int |
--------------------------------------------
1381688313 | 1265950806 |
--------------------------------------------

The same problem was also produced when the FILTER command was omitted,
and the relevant chunk of code in myudfs.httpArgParse is:

StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
if (tok.hasMoreTokens() ) {
String oKey = tok.nextToken();
if (tok.hasMoreTokens() ) {
Object oValue = tok.nextToken();
output.put(oKey, oValue);
} else {
output.put(oKey, null);
}
}

If anyone has any insight how I could get this to work, that'd really
help me out.

Thanks,
Kris

P.S. For those who remember my earlier post about getting httpArgParse
to compile, I took the advice to ditch the InternalMap in favour of a
HashMap<String,Object>

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3

Search Discussions

  • Dmitriy Ryaboy at Dec 8, 2010 at 10:04 pm
    Try explicitly casting argMap#'s' to a chararray?

    On Wed, Dec 8, 2010 at 1:53 PM, Kris Coward wrote:

    Hi,

    I've recently gotten stumped by a problem where my attempts to dump the
    relations produced by a GROUP command give the following error (though
    illustrating the same relation works fine):

    java.io.IOException: Type mismatch in key from map: expected
    org.apache.pig.impl.io.NullableBytesWritable, recieved
    org.apache.pig.impl.io.NullableText
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
    at
    org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
    .
    .
    .

    for a little background, the relation that's failing is called y5, and
    is produced by the following string of commands (in grunt):

    y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as
    argMap;
    y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
    y4 = FILTER y3 BY (uid is not null);
    y5 = GROUP y4 BY uid;

    and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:


    -----------------------------------------------------------------------------------------------------
    y1 | timestamp: int | args: bag({tuple_of_tokens: (token:
    chararray)}) |

    -----------------------------------------------------------------------------------------------------
    1265950806 | {(s=1381688313),
    (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |

    -----------------------------------------------------------------------------------------------------

    -----------------------------------------------------------------------------------------------
    y2 | timestamp: int | argMap: map
    -----------------------------------------------------------------------------------------------
    1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E,
    ts=1265950805, s=1381688313} |

    -----------------------------------------------------------------------------------------------
    --------------------------------------------
    y3 | uid: bytearray | timestamp: int |
    --------------------------------------------
    1381688313 | 1265950806 |
    --------------------------------------------
    --------------------------------------------
    y4 | uid: bytearray | timestamp: int |
    --------------------------------------------
    1381688313 | 1265950806 |
    --------------------------------------------

    The same problem was also produced when the FILTER command was omitted,
    and the relevant chunk of code in myudfs.httpArgParse is:

    StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
    if (tok.hasMoreTokens() ) {
    String oKey = tok.nextToken();
    if (tok.hasMoreTokens() ) {
    Object oValue = tok.nextToken();
    output.put(oKey, oValue);
    } else {
    output.put(oKey, null);
    }
    }

    If anyone has any insight how I could get this to work, that'd really
    help me out.

    Thanks,
    Kris

    P.S. For those who remember my earlier post about getting httpArgParse
    to compile, I took the advice to ditch the InternalMap in favour of a
    HashMap<String,Object>

    --
    Kris Coward http://unripe.melon.org/
    GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
  • Kris Coward at Dec 8, 2010 at 11:20 pm
    That looks to have worked. Thanks.
    On Wed, Dec 08, 2010 at 02:04:07PM -0800, Dmitriy Ryaboy wrote:
    Try explicitly casting argMap#'s' to a chararray?

    On Wed, Dec 8, 2010 at 1:53 PM, Kris Coward wrote:

    Hi,

    I've recently gotten stumped by a problem where my attempts to dump the
    relations produced by a GROUP command give the following error (though
    illustrating the same relation works fine):

    java.io.IOException: Type mismatch in key from map: expected
    org.apache.pig.impl.io.NullableBytesWritable, recieved
    org.apache.pig.impl.io.NullableText
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
    at
    org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
    .
    .
    .

    for a little background, the relation that's failing is called y5, and
    is produced by the following string of commands (in grunt):

    y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as
    argMap;
    y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
    y4 = FILTER y3 BY (uid is not null);
    y5 = GROUP y4 BY uid;

    and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:


    -----------------------------------------------------------------------------------------------------
    y1 | timestamp: int | args: bag({tuple_of_tokens: (token:
    chararray)}) |

    -----------------------------------------------------------------------------------------------------
    1265950806 | {(s=1381688313),
    (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |

    -----------------------------------------------------------------------------------------------------

    -----------------------------------------------------------------------------------------------
    y2 | timestamp: int | argMap: map
    -----------------------------------------------------------------------------------------------
    1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E,
    ts=1265950805, s=1381688313} |

    -----------------------------------------------------------------------------------------------
    --------------------------------------------
    y3 | uid: bytearray | timestamp: int |
    --------------------------------------------
    1381688313 | 1265950806 |
    --------------------------------------------
    --------------------------------------------
    y4 | uid: bytearray | timestamp: int |
    --------------------------------------------
    1381688313 | 1265950806 |
    --------------------------------------------

    The same problem was also produced when the FILTER command was omitted,
    and the relevant chunk of code in myudfs.httpArgParse is:

    StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
    if (tok.hasMoreTokens() ) {
    String oKey = tok.nextToken();
    if (tok.hasMoreTokens() ) {
    Object oValue = tok.nextToken();
    output.put(oKey, oValue);
    } else {
    output.put(oKey, null);
    }
    }

    If anyone has any insight how I could get this to work, that'd really
    help me out.

    Thanks,
    Kris

    P.S. For those who remember my earlier post about getting httpArgParse
    to compile, I took the advice to ditch the InternalMap in favour of a
    HashMap<String,Object>

    --
    Kris Coward http://unripe.melon.org/
    GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
    --
    Kris Coward http://unripe.melon.org/
    GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedDec 8, '10 at 9:53p
activeDec 8, '10 at 11:20p
posts3
users2
websitepig.apache.org

2 users in discussion

Kris Coward: 2 posts Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2021 Grokbase