FAQ
Why am I having tuple objects in my python udfs? This isn't how the
examples work.

Error:

org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error
executing function
at
org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:106)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:320)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: Traceback (most recent call last):
File "udfs.py", line 27, in hour
return tuple_time.tm_hour
AttributeError: 'tuple' object has no attribute 'tm_hour'


udfs.py:

#!/usr/bin/python

import time

def hour(iso_string):
tuple_time = time.strptime(iso_string, "%Y-%m-%dT%H:%M:%S")
return str(tuple_time.tm_hour)


my.pig:

register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/pig/contrib/piggybank/java/piggybank.jar
register /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
register /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
define CustomFormatToISO
org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
define substr org.apache.pig.piggybank.evaluation.string.SUBSTRING();

register 'udfs.py' using jython as agiledata;

rmf /tmp/sent_distribution.txt

/* Get email address pairs for each type of connection, and union them
together */
emails = load '/me/tmp/test_inbox' using AvroStorage();

/* Filter emails according to existence of header pairs, from and [to, cc,
bcc]
project the pairs (may be more than one to/cc/bcc), then emit them,
lowercased. */
filtered = FILTER emails BY (from is not null) and (to is not null) and
(date is not null);
flat = FOREACH filtered GENERATE flatten(from) as from,
flatten(to) as to,
agiledata.hour(date) as date;
a = limit flat 10;
dump a



--
Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

Search Discussions

  • Aniket Mokashi at Feb 5, 2012 at 8:45 am
    Looks like this is jython bug.

    Btw, afaik, the return type of this function would be a bytearray if
    decorator is not specified.

    Thanks,
    Aniket
    On Sat, Feb 4, 2012 at 9:39 PM, Russell Jurney wrote:

    Why am I having tuple objects in my python udfs? This isn't how the
    examples work.

    Error:

    org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error
    executing function
    at

    org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:106)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:320)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
    Caused by: Traceback (most recent call last):
    File "udfs.py", line 27, in hour
    return tuple_time.tm_hour
    AttributeError: 'tuple' object has no attribute 'tm_hour'


    udfs.py:

    #!/usr/bin/python

    import time

    def hour(iso_string):
    tuple_time = time.strptime(iso_string, "%Y-%m-%dT%H:%M:%S")
    return str(tuple_time.tm_hour)


    my.pig:

    register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
    register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
    register /me/pig/contrib/piggybank/java/piggybank.jar
    register /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
    register /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

    define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
    define CustomFormatToISO
    org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
    define substr org.apache.pig.piggybank.evaluation.string.SUBSTRING();

    register 'udfs.py' using jython as agiledata;

    rmf /tmp/sent_distribution.txt

    /* Get email address pairs for each type of connection, and union them
    together */
    emails = load '/me/tmp/test_inbox' using AvroStorage();

    /* Filter emails according to existence of header pairs, from and [to, cc,
    bcc]
    project the pairs (may be more than one to/cc/bcc), then emit them,
    lowercased. */
    filtered = FILTER emails BY (from is not null) and (to is not null) and
    (date is not null);
    flat = FOREACH filtered GENERATE flatten(from) as from,
    flatten(to) as to,
    agiledata.hour(date) as date;
    a = limit flat 10;
    dump a



    --
    Russell Jurney
    twitter.com/rjurney
    russell.jurney@gmail.com
    datasyndrome.com


    --
    "...:::Aniket:::... Quetzalco@tl"
  • Daniel Dai at Feb 6, 2012 at 5:04 am

    Seems like a bug in jython:
    import time
    tuple_time = time.strptime('2006-10-16T08:19:39', "%Y-%m-%dT%H:%M:%S")
    tuple_time.tm_hour
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AttributeError: 'tuple' object has no attribute 'tm_hour'
    tuple_time[3]
    8

    Change return str(tuple_time.tm_hour) into return str(tuple_time[3])
    seems fix the issue.

    Daniel
    On Sun, Feb 5, 2012 at 12:44 AM, Aniket Mokashi wrote:
    Looks like this is jython bug.

    Btw, afaik, the return type of this function would be a bytearray if
    decorator is not specified.

    Thanks,
    Aniket
    On Sat, Feb 4, 2012 at 9:39 PM, Russell Jurney wrote:

    Why am I having tuple objects in my python udfs?  This isn't how the
    examples work.

    Error:

    org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error
    executing function
    at

    org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:106)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:320)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
    Caused by: Traceback (most recent call last):
    File "udfs.py", line 27, in hour
    return tuple_time.tm_hour
    AttributeError: 'tuple' object has no attribute 'tm_hour'


    udfs.py:

    #!/usr/bin/python

    import time

    def hour(iso_string):
    tuple_time = time.strptime(iso_string, "%Y-%m-%dT%H:%M:%S")
    return str(tuple_time.tm_hour)


    my.pig:

    register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
    register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
    register /me/pig/contrib/piggybank/java/piggybank.jar
    register /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
    register /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

    define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
    define CustomFormatToISO
    org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
    define substr org.apache.pig.piggybank.evaluation.string.SUBSTRING();

    register 'udfs.py' using jython as agiledata;

    rmf /tmp/sent_distribution.txt

    /* Get email address pairs for each type of connection, and union them
    together */
    emails = load '/me/tmp/test_inbox' using AvroStorage();

    /* Filter emails according to existence of header pairs, from and [to, cc,
    bcc]
    project the pairs (may be more than one to/cc/bcc), then emit them,
    lowercased. */
    filtered = FILTER emails BY (from is not null) and (to is not null) and
    (date is not null);
    flat = FOREACH filtered GENERATE flatten(from) as from,
    flatten(to) as to,
    agiledata.hour(date) as date;
    a = limit flat 10;
    dump a



    --
    Russell Jurney
    twitter.com/rjurney
    russell.jurney@gmail.com
    datasyndrome.com


    --
    "...:::Aniket:::... Quetzalco@tl"

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 5, '12 at 5:40a
activeFeb 6, '12 at 5:04a
posts3
users3
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase