Grokbase Groups Pig user January 2011
FAQ
Hi,

I have a python UDF, used by a PIG Script.

I get a parsing error for some reason.

------------

REGISTER '/path/to/udf.py' USING jython AS udf;

records = LOAD 'path/to/data' AS (input_line:chararray);

schema_records = FOREACH records GENERATE udf.split_into_words(input_line);

projected_records = FOREACH schema_records GENERATE field1, field2;

DUMP schema_records;

----------

Here's the python udf:

@outputSchema("t:(field1:chararray, field1:chararray)")

def split_into_words(input_line):

line = input_line.strip()

words = line.split()

return (words[0], words[1])

--------------

The error I get is:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: field1 in {t: (field1: chararray, field2: chararray)

What am I doing wrong?

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Search Discussions

  • Jonathan Coveney at Jan 7, 2011 at 6:36 pm
    the result of your UDF is a tuple, so field1 and field2 don't exist. try
    doing GENERATE FLATTEN(udf.etc); and then do a DESCRIBE on schema_records to
    see what the columns are called.

    2011/1/7 <deepak.n85@wipro.com>
    Hi,

    I have a python UDF, used by a PIG Script.

    I get a parsing error for some reason.

    ------------

    REGISTER '/path/to/udf.py' USING jython AS udf;

    records = LOAD 'path/to/data' AS (input_line:chararray);

    schema_records = FOREACH records GENERATE
    udf.split_into_words(input_line);

    projected_records = FOREACH schema_records GENERATE field1, field2;

    DUMP schema_records;

    ----------

    Here's the python udf:

    @outputSchema("t:(field1:chararray, field1:chararray)")

    def split_into_words(input_line):

    line = input_line.strip()

    words = line.split()

    return (words[0], words[1])

    --------------

    The error I get is:

    [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
    parsing. Invalid alias: field1 in {t: (field1: chararray, field2: chararray)

    What am I doing wrong?

    Please do not print this email unless it is absolutely necessary.

    The information contained in this electronic message and any attachments to
    this message are intended for the exclusive use of the addressee(s) and may
    contain proprietary, confidential or privileged information. If you are not
    the intended recipient, you should not disseminate, distribute or copy this
    e-mail. Please notify the sender immediately and destroy all copies of this
    message and any attachments.

    WARNING: Computer viruses can be transmitted via email. The recipient
    should check this email and any attachments for the presence of viruses. The
    company accepts no liability for any damage caused by any virus transmitted
    by this email.

    www.wipro.com
  • Jonathan Coveney at Jan 7, 2011 at 6:43 pm
    It also looks like you can just refer to the touple pieces. So you have two
    options:

    schema_records = FOREACH records GENERATE
    FLATTEN(udf.split_into_words(input_line));

    OR

    projected_records = FOREACH schema_records GENERATE t.field1, t.field2;


    where t is the name of the tuple in the schema of your python UDF

    2011/1/7 Jonathan Coveney <jcoveney@gmail.com>
    the result of your UDF is a tuple, so field1 and field2 don't exist. try
    doing GENERATE FLATTEN(udf.etc); and then do a DESCRIBE on schema_records to
    see what the columns are called.

    2011/1/7 <deepak.n85@wipro.com>

    Hi,
    I have a python UDF, used by a PIG Script.

    I get a parsing error for some reason.

    ------------

    REGISTER '/path/to/udf.py' USING jython AS udf;

    records = LOAD 'path/to/data' AS (input_line:chararray);

    schema_records = FOREACH records GENERATE
    udf.split_into_words(input_line);

    projected_records = FOREACH schema_records GENERATE field1, field2;

    DUMP schema_records;

    ----------

    Here's the python udf:

    @outputSchema("t:(field1:chararray, field1:chararray)")

    def split_into_words(input_line):

    line = input_line.strip()

    words = line.split()

    return (words[0], words[1])

    --------------

    The error I get is:

    [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
    parsing. Invalid alias: field1 in {t: (field1: chararray, field2: chararray)

    What am I doing wrong?

    Please do not print this email unless it is absolutely necessary.

    The information contained in this electronic message and any attachments
    to this message are intended for the exclusive use of the addressee(s) and
    may contain proprietary, confidential or privileged information. If you are
    not the intended recipient, you should not disseminate, distribute or copy
    this e-mail. Please notify the sender immediately and destroy all copies of
    this message and any attachments.

    WARNING: Computer viruses can be transmitted via email. The recipient
    should check this email and any attachments for the presence of viruses. The
    company accepts no liability for any damage caused by any virus transmitted
    by this email.

    www.wipro.com
  • Deepak N85 at Jan 8, 2011 at 1:47 pm
    Thanks! That worked!

    -----Original Message-----
    From: Jonathan Coveney
    Sent: Saturday, January 08, 2011 12:13 AM
    To: user@pig.apache.org
    Subject: Re: Error 1000: UDF Python

    It also looks like you can just refer to the touple pieces. So you have two
    options:

    schema_records = FOREACH records GENERATE FLATTEN(udf.split_into_words(input_line));

    OR

    projected_records = FOREACH schema_records GENERATE t.field1, t.field2;


    where t is the name of the tuple in the schema of your python UDF

    2011/1/7 Jonathan Coveney <jcoveney@gmail.com>
    the result of your UDF is a tuple, so field1 and field2 don't exist.
    try doing GENERATE FLATTEN(udf.etc); and then do a DESCRIBE on
    schema_records to see what the columns are called.

    2011/1/7 <deepak.n85@wipro.com>

    Hi,
    I have a python UDF, used by a PIG Script.

    I get a parsing error for some reason.

    ------------

    REGISTER '/path/to/udf.py' USING jython AS udf;

    records = LOAD 'path/to/data' AS (input_line:chararray);

    schema_records = FOREACH records GENERATE
    udf.split_into_words(input_line);

    projected_records = FOREACH schema_records GENERATE field1, field2;

    DUMP schema_records;

    ----------

    Here's the python udf:

    @outputSchema("t:(field1:chararray, field1:chararray)")

    def split_into_words(input_line):

    line = input_line.strip()

    words = line.split()

    return (words[0], words[1])

    --------------

    The error I get is:

    [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error
    during parsing. Invalid alias: field1 in {t: (field1: chararray,
    field2: chararray)

    What am I doing wrong?

    Please do not print this email unless it is absolutely necessary.

    The information contained in this electronic message and any
    attachments to this message are intended for the exclusive use of the
    addressee(s) and may contain proprietary, confidential or privileged
    information. If you are not the intended recipient, you should not
    disseminate, distribute or copy this e-mail. Please notify the sender
    immediately and destroy all copies of this message and any attachments.

    WARNING: Computer viruses can be transmitted via email. The recipient
    should check this email and any attachments for the presence of
    viruses. The company accepts no liability for any damage caused by
    any virus transmitted by this email.

    www.wipro.com
    Please do not print this email unless it is absolutely necessary.

    The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

    WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

    www.wipro.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 7, '11 at 6:31p
activeJan 8, '11 at 1:47p
posts4
users2
websitepig.apache.org

2 users in discussion

Jonathan Coveney: 2 posts Deepak N85: 2 posts

People

Translate

site design / logo © 2022 Grokbase