Grokbase Groups Pig user March 2012
FAQ
Hi all.

I'm trying to write a simple filter function (to be used with the FILTER operator) in python, but I don't seem to find the right way to specify its schema. I'm using pig 0.9.2.

The filter's code is (trivially):
def trivial_filter(s):
return True
What's the right way of annotating it so that pig understands it returns a boolean?

I've tried with:
- @outputSchema("b:boolean") but this causes :
ERROR 1200: <line 1, column 2> Syntax error, unexpected symbol at or near 'boolean
- @outputSchema("b:int") is also rejected (as expected):
ERROR 1058:
<file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must evaluate to boolean. Found: int

Thanks,
Marco

Search Discussions

  • Jonathan Coveney at Mar 16, 2012 at 12:34 am
    I don't know if you can do a filterfunc per se, but a hack would be to
    return an int, and do 1 if true and 0 otherwise, and filter by
    yourudf(input)==1

    2012/3/15 Marco Cova <marco.cova@gmail.com>
    Hi all.

    I'm trying to write a simple filter function (to be used with the FILTER
    operator) in python, but I don't seem to find the right way to specify its
    schema. I'm using pig 0.9.2.

    The filter's code is (trivially):
    def trivial_filter(s):
    return True
    What's the right way of annotating it so that pig understands it returns a
    boolean?

    I've tried with:
    - @outputSchema("b:boolean") but this causes :
    ERROR 1200: <line 1, column 2> Syntax error, unexpected symbol at or near
    'boolean
    - @outputSchema("b:int") is also rejected (as expected):
    ERROR 1058:
    <file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must
    evaluate to boolean. Found: int

    Thanks,
    Marco

  • Marco Cova at Mar 16, 2012 at 7:08 am
    Jonathan,

    Thanks: this will do.

    Marco
    On Mar 15, 2012, at 5:34 PM, Jonathan Coveney wrote:

    I don't know if you can do a filterfunc per se, but a hack would be to
    return an int, and do 1 if true and 0 otherwise, and filter by
    yourudf(input)==1

    2012/3/15 Marco Cova <marco.cova@gmail.com>
    Hi all.

    I'm trying to write a simple filter function (to be used with the FILTER
    operator) in python, but I don't seem to find the right way to specify its
    schema. I'm using pig 0.9.2.

    The filter's code is (trivially):
    def trivial_filter(s):
    return True
    What's the right way of annotating it so that pig understands it returns a
    boolean?

    I've tried with:
    - @outputSchema("b:boolean") but this causes :
    ERROR 1200: <line 1, column 2> Syntax error, unexpected symbol at or near
    'boolean
    - @outputSchema("b:int") is also rejected (as expected):
    ERROR 1058:
    <file pdns-long-nxdomains.pig, line 9, column 17> Filter's condition must
    evaluate to boolean. Found: int

    Thanks,
    Marco

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 15, '12 at 11:03p
activeMar 16, '12 at 7:08a
posts3
users2
websitepig.apache.org

2 users in discussion

Marco Cova: 2 posts Jonathan Coveney: 1 post

People

Translate

site design / logo © 2021 Grokbase