Grokbase Groups Pig user January 2011
FAQ
Hi,
I want to write a python udf to split string into bags

------------------------------------------------------------
#!/usr/bin/python

import re
@outputSchema("y:bag{t:tuple(word:chararray)}")
def strsplittobag(content,regex):
return re.compile(regex).split(content)
------------------------------------------------------------

it gave an error saying "could not instantiate
'org.apache.pig.scripting.jython.JythonFunction' with arguments
'[/home/.../mypyudfs.py, strsplittobag]'". I had some other python
udfs working, so shouldn't be configuration problem. I am new to
python, did I miss anything?

Thanks!

Shawn

Search Discussions

  • Daniel Dai at Jan 25, 2011 at 12:29 am
    Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there,
    do ant first). This is a bug we need to fix.

    Daniel

    Xiaomeng Wan wrote:
    Hi,
    I want to write a python udf to split string into bags

    ------------------------------------------------------------
    #!/usr/bin/python

    import re
    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    return re.compile(regex).split(content)
    ------------------------------------------------------------

    it gave an error saying "could not instantiate
    'org.apache.pig.scripting.jython.JythonFunction' with arguments
    '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python
    udfs working, so shouldn't be configuration problem. I am new to
    python, did I miss anything?

    Thanks!

    Shawn
  • Xiaomeng Wan at Jan 25, 2011 at 8:50 pm
    Hi Daniel,

    I did put jython.jar in classpath. By comparing other python udfs with
    this one, I find those udfs which work do not import anything. Could
    that be the cause? Do I need to anything extra to import module in my
    udf?

    Thanks!

    Shawn
    On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai wrote:
    Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do
    ant first). This is a bug we need to fix.

    Daniel

    Xiaomeng Wan wrote:
    Hi,
    I want to write a python udf to split string into bags

    ------------------------------------------------------------
    #!/usr/bin/python

    import re
    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    return re.compile(regex).split(content)
    ------------------------------------------------------------

    it gave an error saying "could not instantiate
    'org.apache.pig.scripting.jython.JythonFunction' with arguments
    '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python
    udfs working, so shouldn't be configuration problem. I am new to
    python, did I miss anything?

    Thanks!

    Shawn
  • Richard Ding at Jan 26, 2011 at 1:47 am
    You're right. There're two issues here. First, the Jython script needs to locate the modules in its search path (e.g. python.path). If you have the right env variable set, Jython script should be able to find and import the module. Second, Pig currently doesn't automatically ship the module file to the backend, so even if you set the search path in the frontend, the backend still cannot locate the module.

    Finally, there is incompatibility between Python modules and Jython modules. You need to use Jython modules that come with Jython installation (in the Lib directory).

    We're looking into these issues and hoping to provide a solution in the next release.

    Thanks,
    -Richard


    On 1/25/11 12:50 PM, "Xiaomeng Wan" wrote:

    Hi Daniel,

    I did put jython.jar in classpath. By comparing other python udfs with
    this one, I find those udfs which work do not import anything. Could
    that be the cause? Do I need to anything extra to import module in my
    udf?

    Thanks!

    Shawn
    On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai wrote:
    Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do
    ant first). This is a bug we need to fix.

    Daniel

    Xiaomeng Wan wrote:
    Hi,
    I want to write a python udf to split string into bags

    ------------------------------------------------------------
    #!/usr/bin/python

    import re
    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    return re.compile(regex).split(content)
    ------------------------------------------------------------

    it gave an error saying "could not instantiate
    'org.apache.pig.scripting.jython.JythonFunction' with arguments
    '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python
    udfs working, so shouldn't be configuration problem. I am new to
    python, did I miss anything?

    Thanks!

    Shawn
  • Julien Le Dem at Jan 26, 2011 at 6:04 pm
    As a workaround, in Jython you can also use the java classes.
    Something like: (not tested)

    from java.util.regex import *
    from java.lang import *

    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    return Pattern.compile(regex).split(content)

    Julien

    On 1/25/11 5:46 PM, "Richard Ding" wrote:

    You're right. There're two issues here. First, the Jython script needs to locate the modules in its search path (e.g. python.path). If you have the right env variable set, Jython script should be able to find and import the module. Second, Pig currently doesn't automatically ship the module file to the backend, so even if you set the search path in the frontend, the backend still cannot locate the module.

    Finally, there is incompatibility between Python modules and Jython modules. You need to use Jython modules that come with Jython installation (in the Lib directory).

    We're looking into these issues and hoping to provide a solution in the next release.

    Thanks,
    -Richard


    On 1/25/11 12:50 PM, "Xiaomeng Wan" wrote:

    Hi Daniel,

    I did put jython.jar in classpath. By comparing other python udfs with
    this one, I find those udfs which work do not import anything. Could
    that be the cause? Do I need to anything extra to import module in my
    udf?

    Thanks!

    Shawn
    On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai wrote:
    Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do
    ant first). This is a bug we need to fix.

    Daniel

    Xiaomeng Wan wrote:
    Hi,
    I want to write a python udf to split string into bags

    ------------------------------------------------------------
    #!/usr/bin/python

    import re
    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    return re.compile(regex).split(content)
    ------------------------------------------------------------

    it gave an error saying "could not instantiate
    'org.apache.pig.scripting.jython.JythonFunction' with arguments
    '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python
    udfs working, so shouldn't be configuration problem. I am new to
    python, did I miss anything?

    Thanks!

    Shawn
  • Xiaomeng Wan at Jan 26, 2011 at 9:42 pm
    It works! Only need to explicitly cast the results into bag of tuples.


    from java.util.regex import *
    from java.lang import *

    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    toks = Pattern.compile(regex).split(content)
    outBag = []
    for tok in toks:
    tup = tok,
    outBag.append(tup)
    return outBag

    Thank you all!

    Shawn
    On Wed, Jan 26, 2011 at 11:01 AM, Julien Le Dem wrote:
    As a workaround, in Jython you can also use the java classes.
    Something like: (not tested)

    from java.util.regex import *
    from java.lang import *

    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    return Pattern.compile(regex).split(content)

    Julien

    On 1/25/11 5:46 PM, "Richard Ding" wrote:

    You're right. There're two issues here. First, the Jython script needs to
    locate the modules in its search path (e.g. python.path). If you have the
    right env variable set, Jython script should be able to find and import the
    module. Second, Pig currently doesn't automatically ship the module file to
    the backend, so even if you set the search path in the frontend, the backend
    still cannot locate the module.

    Finally, there is incompatibility between Python modules and Jython modules.
    You need to use Jython modules that come with Jython installation (in the
    Lib directory).

    We're looking into these issues and hoping to provide a solution in the next
    release.

    Thanks,
    -Richard


    On 1/25/11 12:50 PM, "Xiaomeng Wan" wrote:

    Hi Daniel,

    I did put jython.jar in classpath. By comparing other python udfs with
    this one, I find those udfs which work do not import anything. Could
    that be the cause? Do I need to anything extra to import module in my
    udf?

    Thanks!

    Shawn
    On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai wrote:
    Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do
    ant first). This is a bug we need to fix.

    Daniel

    Xiaomeng Wan wrote:
    Hi,
    I want to write a python udf to split string into bags

    ------------------------------------------------------------
    #!/usr/bin/python

    import re
    @outputSchema("y:bag{t:tuple(word:chararray)}")
    def strsplittobag(content,regex):
    return re.compile(regex).split(content)
    ------------------------------------------------------------

    it gave an error saying "could not instantiate
    'org.apache.pig.scripting.jython.JythonFunction' with arguments
    '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python
    udfs working, so shouldn't be configuration problem. I am new to
    python, did I miss anything?

    Thanks!

    Shawn

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 24, '11 at 11:54p
activeJan 26, '11 at 9:42p
posts6
users4
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase