Grokbase Groups Pig user April 2011
FAQ
Hi,



I am trying to run a filter against a column which is the result of a
flatten operation. But the filter clause throws an exception as
org.apache.pig.data.DataByteArray cannot be cast to java.lang.String. The
exception is against the line doing the matches filter. If I change matches
to eq, I am not getting the exception and I don't get any result though I
have ColumnName having 'Page'.



Suspect the datatype of the ColumnName (which is result of flatten) of the
relation VisitPages is still bytearray. I have tried casting it to chararray
still same exception. However if I describe of VisitDetails it shows as
chararray. Any suggestions?



Below is the pig script:



Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
ColumnValue:chararray)})});

VisitsFlattened = FOREACH Visits GENERATE Id,
FLATTEN(DetailsBag.VisitDetails);

VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
FLATTEN(VisitDetails);

VisitDetails = FOREACH VisitDetailsFlattened GENERATE (chararray)ColumnName,
(chararray)ColumnValue, (chararray)Id;

DESCRIBE VisitDetails;

VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');

dump VisitPages;

......



Thanks,

badri



Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.

Search Discussions

  • Daniel Dai at Apr 7, 2011 at 10:23 pm
    Which version of Pig are you using? Previous version of Pig have trouble
    cast nested types. Can you try latest trunk?

    Daniel
    On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
    Hi,



    I am trying to run a filter against a column which is the result of a
    flatten operation. But the filter clause throws an exception as
    org.apache.pig.data.DataByteArray cannot be cast to java.lang.String. The
    exception is against the line doing the matches filter. If I change matches
    to eq, I am not getting the exception and I don't get any result though I
    have ColumnName having 'Page'.



    Suspect the datatype of the ColumnName (which is result of flatten) of the
    relation VisitPages is still bytearray. I have tried casting it to chararray
    still same exception. However if I describe of VisitDetails it shows as
    chararray. Any suggestions?



    Below is the pig script:



    Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
    DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
    VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
    ColumnValue:chararray)})});

    VisitsFlattened = FOREACH Visits GENERATE Id,
    FLATTEN(DetailsBag.VisitDetails);

    VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
    FLATTEN(VisitDetails);

    VisitDetails = FOREACH VisitDetailsFlattened GENERATE (chararray)ColumnName,
    (chararray)ColumnValue, (chararray)Id;

    DESCRIBE VisitDetails;

    VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');

    dump VisitPages;

    ......



    Thanks,

    badri



    Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
  • Badrinarayanan S at Apr 8, 2011 at 6:58 am
    Hi Daniel,

    I was using Pig 0.8.0, I also ran against latest trunk. Still same issue.

    Thanks,
    badri

    -----Original Message-----
    From: Daniel Dai
    Sent: Friday, April 08, 2011 3:53 AM
    To: user@pig.apache.org
    Subject: Re: Pig filter against flatten column

    Which version of Pig are you using? Previous version of Pig have trouble
    cast nested types. Can you try latest trunk?

    Daniel
    On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
    Hi,



    I am trying to run a filter against a column which is the result of a
    flatten operation. But the filter clause throws an exception as
    org.apache.pig.data.DataByteArray cannot be cast to java.lang.String. The
    exception is against the line doing the matches filter. If I change matches
    to eq, I am not getting the exception and I don't get any result though I
    have ColumnName having 'Page'.



    Suspect the datatype of the ColumnName (which is result of flatten) of the
    relation VisitPages is still bytearray. I have tried casting it to chararray
    still same exception. However if I describe of VisitDetails it shows as
    chararray. Any suggestions?



    Below is the pig script:



    Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
    DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
    VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
    ColumnValue:chararray)})});

    VisitsFlattened = FOREACH Visits GENERATE Id,
    FLATTEN(DetailsBag.VisitDetails);

    VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
    FLATTEN(VisitDetails);

    VisitDetails = FOREACH VisitDetailsFlattened GENERATE
    (chararray)ColumnName,
    (chararray)ColumnValue, (chararray)Id;

    DESCRIBE VisitDetails;

    VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');

    dump VisitPages;

    ......



    Thanks,

    badri



    Disclaimer: This message (including any attachments) is being sent from
    Fifth Generation Technologies India (P) Ltd. (5G) and may contain
    information that is proprietary, confidential and privileged. If you are not
    the intended recipient, please inform the sender immediately by reply e-mail
    and delete this message and attachments from your system, without retaining
    a copy. Any unauthorized use or dissemination of this message in whole or in
    part is strictly prohibited. 5G shall not be liable for the improper or
    incomplete transmission of the information contained in this communication
    nor for any delay in its receipt or damage to your system. 5G does not
    guarantee that the integrity of this communication has been maintained nor
    that this communication is free of viruses, interceptions or interference.





    Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
  • Daniel Dai at Apr 8, 2011 at 6:33 pm
    I tried PigStorage, seems it is Ok. Suspect the issue is in
    CassandraStorage only. Where can I find the source code of it?

    On 04/07/2011 11:56 PM, Badrinarayanan S wrote:
    Hi Daniel,

    I was using Pig 0.8.0, I also ran against latest trunk. Still same issue.

    Thanks,
    badri

    -----Original Message-----
    From: Daniel Dai
    Sent: Friday, April 08, 2011 3:53 AM
    To: user@pig.apache.org
    Subject: Re: Pig filter against flatten column

    Which version of Pig are you using? Previous version of Pig have trouble
    cast nested types. Can you try latest trunk?

    Daniel
    On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
    Hi,



    I am trying to run a filter against a column which is the result of a
    flatten operation. But the filter clause throws an exception as
    org.apache.pig.data.DataByteArray cannot be cast to java.lang.String. The
    exception is against the line doing the matches filter. If I change matches
    to eq, I am not getting the exception and I don't get any result though I
    have ColumnName having 'Page'.



    Suspect the datatype of the ColumnName (which is result of flatten) of the
    relation VisitPages is still bytearray. I have tried casting it to chararray
    still same exception. However if I describe of VisitDetails it shows as
    chararray. Any suggestions?



    Below is the pig script:



    Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
    DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
    VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
    ColumnValue:chararray)})});

    VisitsFlattened = FOREACH Visits GENERATE Id,
    FLATTEN(DetailsBag.VisitDetails);

    VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
    FLATTEN(VisitDetails);

    VisitDetails = FOREACH VisitDetailsFlattened GENERATE
    (chararray)ColumnName,
    (chararray)ColumnValue, (chararray)Id;

    DESCRIBE VisitDetails;

    VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');

    dump VisitPages;

    ......



    Thanks,

    badri



    Disclaimer: This message (including any attachments) is being sent from
    Fifth Generation Technologies India (P) Ltd. (5G) and may contain
    information that is proprietary, confidential and privileged. If you are not
    the intended recipient, please inform the sender immediately by reply e-mail
    and delete this message and attachments from your system, without retaining
    a copy. Any unauthorized use or dissemination of this message in whole or in
    part is strictly prohibited. 5G shall not be liable for the improper or
    incomplete transmission of the information contained in this communication
    nor for any delay in its receipt or damage to your system. 5G does not
    guarantee that the integrity of this communication has been maintained nor
    that this communication is free of viruses, interceptions or interference.





    Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
  • Jeremy Hanna at Apr 8, 2011 at 6:50 pm
    The 0.7.4 version is here:
    http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.7.4/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

    The latest from 0.7 branch contains a way to get the cassandra schema for the column family it is querying against though:
    http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

    The contrib/pig directory has the build script. If you download the source either from
    http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.4/apache-cassandra-0.7.4-src.tar.gz
    or from
    http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/ -
    everything is in the contrib/pig directory.

    I'm also in #hadoop-pig on freenode (jeromatron) as well as Brandon (driftx) if you had any questions about it.
    On Apr 8, 2011, at 1:31 PM, Daniel Dai wrote:

    I tried PigStorage, seems it is Ok. Suspect the issue is in CassandraStorage only. Where can I find the source code of it?

    On 04/07/2011 11:56 PM, Badrinarayanan S wrote:
    Hi Daniel,

    I was using Pig 0.8.0, I also ran against latest trunk. Still same issue.

    Thanks,
    badri

    -----Original Message-----
    From: Daniel Dai
    Sent: Friday, April 08, 2011 3:53 AM
    To: user@pig.apache.org
    Subject: Re: Pig filter against flatten column

    Which version of Pig are you using? Previous version of Pig have trouble
    cast nested types. Can you try latest trunk?

    Daniel
    On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
    Hi,



    I am trying to run a filter against a column which is the result of a
    flatten operation. But the filter clause throws an exception as
    org.apache.pig.data.DataByteArray cannot be cast to java.lang.String. The
    exception is against the line doing the matches filter. If I change matches
    to eq, I am not getting the exception and I don't get any result though I
    have ColumnName having 'Page'.



    Suspect the datatype of the ColumnName (which is result of flatten) of the
    relation VisitPages is still bytearray. I have tried casting it to chararray
    still same exception. However if I describe of VisitDetails it shows as
    chararray. Any suggestions?



    Below is the pig script:



    Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
    DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
    VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
    ColumnValue:chararray)})});

    VisitsFlattened = FOREACH Visits GENERATE Id,
    FLATTEN(DetailsBag.VisitDetails);

    VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
    FLATTEN(VisitDetails);

    VisitDetails = FOREACH VisitDetailsFlattened GENERATE
    (chararray)ColumnName,
    (chararray)ColumnValue, (chararray)Id;

    DESCRIBE VisitDetails;

    VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');

    dump VisitPages;

    ......



    Thanks,

    badri



    Disclaimer: This message (including any attachments) is being sent from
    Fifth Generation Technologies India (P) Ltd. (5G) and may contain
    information that is proprietary, confidential and privileged. If you are not
    the intended recipient, please inform the sender immediately by reply e-mail
    and delete this message and attachments from your system, without retaining
    a copy. Any unauthorized use or dissemination of this message in whole or in
    part is strictly prohibited. 5G shall not be liable for the improper or
    incomplete transmission of the information contained in this communication
    nor for any delay in its receipt or damage to your system. 5G does not
    guarantee that the integrity of this communication has been maintained nor
    that this communication is free of viruses, interceptions or interference.





    Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 7, '11 at 12:28p
activeApr 8, '11 at 6:50p
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase