Grokbase Groups Pig user June 2011
FAQ
Hello All,

I'm having an issue where I get a 'ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when passing in something of type chararray to REGEX_EXTRACT.

e.g.
A = load '/path/to/some/data' ....
where A has a schema of something like ( f1:chararray, .... )

B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as regex_extract;

This gives me the above error.

Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is ( f1:bytearray, ..... ) this works as expected.


What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
My understanding is that chararray is supposed to be used for things that are Strings, which is why I find the 'cannot cast to String' exception a bit funky. I've looked through the REGEX_EXTRACT source and looked over the JavaDoc's pertaining to DataTypes without being able to crack this.

Any help and information is appreciated!
Thanks for you time,

Michael

Search Discussions

  • Dmitriy Ryaboy at Jun 23, 2011 at 8:36 am
    Which version of pig? Are you using a special loader?
    I just tried with 8.1:

    n = load 'tmp/numbers.txt' as (num:chararray);
    f = foreach n generate REGEX_EXTRACT($0, '(\\d)', 1);
    dump f;
    (1)
    (2)
    (3)
    (4)
    (5)


    -D
    On Wed, Jun 22, 2011 at 3:59 PM, Michael May wrote:

    Hello All,

    I'm having an issue where I get a 'ClassCastException:
    org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when
    passing in something of type chararray to REGEX_EXTRACT.

    e.g.
    A = load '/path/to/some/data' ....
    where A has a schema of something like ( f1:chararray, .... )

    B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as
    regex_extract;

    This gives me the above error.

    Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is (
    f1:bytearray, ..... ) this works as expected.


    What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
    My understanding is that chararray is supposed to be used for things that
    are Strings, which is why I find the 'cannot cast to String' exception a bit
    funky. I've looked through the REGEX_EXTRACT source and looked over the
    JavaDoc's pertaining to DataTypes without being able to crack this.

    Any help and information is appreciated!
    Thanks for you time,

    Michael
  • Michael May at Jun 23, 2011 at 8:37 pm
    I'm on Pig 0.8.0.

    I am using a custom loader that is extending LoadFunc and implementing LoadMetaData. I think my custom loader is essentially attempting to do what PigStorageSchema does in PiggyBank.
    After reading through the PigStorageSchema source it was pretty obvious that I had overlooked several things in my implementation. I'm going to go ahead and try to use PigStorageSchema.

    Thanks for the help,
    Michael
    On Jun 23, 2011, at 3:35 AM, Dmitriy Ryaboy wrote:

    Which version of pig? Are you using a special loader?
    I just tried with 8.1:

    n = load 'tmp/numbers.txt' as (num:chararray);
    f = foreach n generate REGEX_EXTRACT($0, '(\\d)', 1);
    dump f;
    (1)
    (2)
    (3)
    (4)
    (5)


    -D
    On Wed, Jun 22, 2011 at 3:59 PM, Michael May wrote:

    Hello All,

    I'm having an issue where I get a 'ClassCastException:
    org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when
    passing in something of type chararray to REGEX_EXTRACT.

    e.g.
    A = load '/path/to/some/data' ....
    where A has a schema of something like ( f1:chararray, .... )

    B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as
    regex_extract;

    This gives me the above error.

    Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is (
    f1:bytearray, ..... ) this works as expected.


    What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
    My understanding is that chararray is supposed to be used for things that
    are Strings, which is why I find the 'cannot cast to String' exception a bit
    funky. I've looked through the REGEX_EXTRACT source and looked over the
    JavaDoc's pertaining to DataTypes without being able to crack this.

    Any help and information is appreciated!
    Thanks for you time,

    Michael

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 22, '11 at 10:59p
activeJun 23, '11 at 8:37p
posts3
users2
websitepig.apache.org

2 users in discussion

Michael May: 2 posts Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2021 Grokbase