Grokbase Groups Hive user July 2009
FAQ
I'm trying to register a UDF to parse my log file format. Where can I find
documentation for creating and registering a UDF?

My attempts failed with this error:

hive> create temporary function process_line as 'LogProcessor';
FAILED: Unknown exception : Registering UDF Class class LogProcessor which
does not extends class org.apache.hadoop.hive.ql.exec.UDF

Specific questions:

1. Do I need to define the a particular function in the class? For example,
run()
2. What arguments should that function accept?
3. What should be the return type of that function?
4. What if the function needs to return multiple values? Each value mapping
to a column in the table?

Saurabh.

Search Discussions

  • Ashish Thusoo at Jul 14, 2009 at 6:45 pm
    Not sure if you got an answer for this.

    You can look at the following test case in the source tree to guide you on how to build a udf. Will put this on the wiki.

    create_genericudf.q
    udf_testlength.q

    The udf has to implement either the UDF interface or the GenericUDF interface. The later handles cases for UDFs that can take complex objects as arguments or have variable length arguments or return complex objects. The UDF interface is easier to program to, but is more limited than the GenericUDF interface.

    There are some nuances that you need to be aware of about the function resolution logic incase the UDF has polymorphism in the evaluate functions. I can go into more details if that is the case for you.

    Ashish


    ________________________________
    From: Saurabh Nanda
    Sent: Tuesday, July 14, 2009 1:09 AM
    To: hive-user@hadoop.apache.org
    Subject: Creating a UDF (was Hive SerDe?)

    I'm trying to register a UDF to parse my log file format. Where can I find documentation for creating and registering a UDF?

    My attempts failed with this error:

    hive> create temporary function process_line as 'LogProcessor';
    FAILED: Unknown exception : Registering UDF Class class LogProcessor which does not extends class org.apache.hadoop.hive.ql.exec.UDF

    Specific questions:

    1. Do I need to define the a particular function in the class? For example, run()
    2. What arguments should that function accept?
    3. What should be the return type of that function?
    4. What if the function needs to return multiple values? Each value mapping to a column in the table?

    Saurabh.
  • Min Zhou at Jul 15, 2009 at 1:50 am
    Hi Saurabh,

    Ahish is right. You udf must inherit UDF or GenericUDF. If you build that
    udf class into a seperate jar, "add jar " command should be run at first.

    hive> add jar jar_path;
    hive> create temporary function udf_name as 'UdfClass';

    Hope helpful.


    Min
    On Wed, Jul 15, 2009 at 2:44 AM, Ashish Thusoo wrote:

    Not sure if you got an answer for this.

    You can look at the following test case in the source tree to guide you on
    how to build a udf. Will put this on the wiki.

    create_genericudf.q
    udf_testlength.q

    The udf has to implement either the UDF interface or the GenericUDF
    interface. The later handles cases for UDFs that can take complex objects as
    arguments or have variable length arguments or return complex objects. The
    UDF interface is easier to program to, but is more limited than the
    GenericUDF interface.

    There are some nuances that you need to be aware of about the function
    resolution logic incase the UDF has polymorphism in the evaluate functions.
    I can go into more details if that is the case for you.

    Ashish


    ------------------------------
    *From:* Saurabh Nanda
    *Sent:* Tuesday, July 14, 2009 1:09 AM
    *To:* hive-user@hadoop.apache.org
    *Subject:* Creating a UDF (was Hive SerDe?)

    I'm trying to register a UDF to parse my log file format. Where can I find
    documentation for creating and registering a UDF?

    My attempts failed with this error:

    hive> create temporary function process_line as 'LogProcessor';
    FAILED: Unknown exception : Registering UDF Class class LogProcessor which
    does not extends class org.apache.hadoop.hive.ql.exec.UDF

    Specific questions:

    1. Do I need to define the a particular function in the class? For example,
    run()
    2. What arguments should that function accept?
    3. What should be the return type of that function?
    4. What if the function needs to return multiple values? Each value mapping
    to a column in the table?

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com


    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com
  • Saurabh Nanda at Jul 15, 2009 at 11:56 am


    You can look at the following test case in the source tree to guide you on
    how to build a udf. Will put this on the wiki.

    create_genericudf.q
    udf_testlength.q
    Hi Ashish,

    I found the udf_testlength.q script and the class that if refers. However, I
    couldn't find the create_genericudf.q file. I grepped the entire release,
    but could not find the string 'genericudf' anywhere. This is the release in
    which I'm looking --
    http://apache.mirrors.tds.net/hadoop/hive/hive-0.3.0/hive-0.3.0-hadoop-0.18.0-dev.tar.gz

    In fact, the file ./src/ql/src/java/org/apache/hadoop/hive/ql/exec/UDF.java
    itself does not refer to GenericUDF.

    Saurabh.
  • Raghu Murthy at Jul 15, 2009 at 12:06 pm
    0.3 is quite old. You should look at trunk
    http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
    branch soon.
    On 7/15/09 4:55 AM, "Saurabh Nanda" wrote:


    You can look at the following test case in the source tree to guide you on
    how
    to build a udf. Will put this on the wiki.

    create_genericudf.q
    udf_testlength.q
    Hi Ashish,

    I found the udf_testlength.q script and the class that if refers. However, I
    couldn't find the create_genericudf.q file. I grepped the entire release, but
    could not find the string 'genericudf' anywhere. This is the release in which
    I'm looking --
    http://apache.mirrors.tds.net/hadoop/hive/hive-0.3.0/hive-0.3.0-hadoop-0.18.0-
    dev.tar.gz

    In fact, the file ./src/ql/src/java/org/apache/hadoop/hive/ql/exec/UDF.java
    itself does not refer to GenericUDF.

    Saurabh.
  • Saurabh Nanda at Jul 15, 2009 at 12:11 pm
    0.3 is quite old. You should look at trunk
    http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
    branch soon.

    Is the file not available in 0.3 or the feature itself is not available?
    Does this mean I need to compile Hive from source to create a UDF?

    Saurabh.
  • Saurabh Nanda at Jul 15, 2009 at 12:21 pm

    0.3 is quite old. You should look at trunk
    http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
    branch soon.
    http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTestTranslate.javalooks
    so intimidating! Also, this is not exactly a UDF that returns multiple
    values, is it?

    Have you compared this with the approach Cloudbase is taking to UDFs? It's a
    breeze. Why is Hive putting so much of complexity into this?

    Saurabh.
  • Zheng Shao at Jul 16, 2009 at 7:06 am
    HI Saurabh,

    Hive supports both UDF and GenericUDF.

    UDF are much easier to write, but it is currently limited to work with
    primitive types (including String).

    GenericUDF supports advanced features including complex type
    parameters/return values, short-circuit computation, complete object
    reuse (no need to create a single new object for each call) etc.
    Some of these features are not currently provided in other systems
    yet, so GenericUDF looks more complicated.

    I guess you just a normal UDF for now. Please take a look at
    UDF*.java. Those are very easy to understand and write.


    Zheng

    On Wed, Jul 15, 2009 at 5:20 AM, Saurabh Nandawrote:
    0.3 is quite old. You should look at trunk
    http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
    branch soon.
    http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTestTranslate.java
    looks so intimidating! Also, this is not exactly a UDF that returns multiple
    values, is it?

    Have you compared this with the approach Cloudbase is taking to UDFs? It's a
    breeze. Why is Hive putting so much of complexity into this?

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com


    --
    Yours,
    Zheng
  • Saurabh Nanda at Jul 16, 2009 at 11:09 am
    I've added the JAR (containing my UDF class) in the session. I've issued the
    CREATE TEMPORARY FUNCTION command. However, all my map tasks fail with a
    ClassNotFoundException when I try to run a query with the UDF:

    select ct_ip_address(line) from raw limit 10;

    (ct_ip_address is the UDF I have registered against my class)

    What am I doing wrong? Does a class extending UDF need to be in a particular
    package?

    Saurabh.
  • He Yongqiang at Jul 16, 2009 at 11:38 am
    Did you run ‘add jar path_to_the-jar-including-your-udfclass’ first before
    you issue ‘CREATE TEMPORARY FUNCTION’;
    The actual mapper and reducer are run in the hadoop cluster, and hive’s “add
    jar command” will distribute you jar file to the worker nodes using hadoop’s
    distributed cache, so the mappers and reducers running in those machines can
    find the class.

    On 09-7-16 下午7:10, "Saurabh Nanda" wrote:

    I've added the JAR (containing my UDF class) in the session. I've issued the
    CREATE TEMPORARY FUNCTION command. However, all my map tasks fail with a
    ClassNotFoundException when I try to run a query with the UDF:

    select ct_ip_address(line) from raw limit 10;

    (ct_ip_address is the UDF I have registered against my class)

    What am I doing wrong? Does a class extending UDF need to be in a particular
    package?

    Saurabh.
  • Saurabh Nanda at Jul 16, 2009 at 11:43 am

    Did you run ‘add jar path_to_the-jar-including-your-udfclass’ first
    before you issue ‘CREATE TEMPORARY FUNCTION’;

    hive> add jar myjar.jar;
    Usage: add [FILE] <value> [<value>]*
    hive> add file myjar.jar;
    hive>

    Apparently, ADD JAR doesn't work for me. I am however using ADD FILE before
    CREATE TEMPORARY FUNCTION.

    I'm on the hive-0.3.0-hadoop-0.18.0-bin release.

    Saurabh.
  • Min Zhou at Jul 17, 2009 at 1:34 am
    the release is quite old, we implemented "add jar" after this release.
    On Thu, Jul 16, 2009 at 7:44 PM, Saurabh Nanda wrote:


    Did you run ‘add jar path_to_the-jar-including-your-udfclass’ first
    before you issue ‘CREATE TEMPORARY FUNCTION’;

    hive> add jar myjar.jar;
    Usage: add [FILE] <value> [<value>]*
    hive> add file myjar.jar;
    hive>

    Apparently, ADD JAR doesn't work for me. I am however using ADD FILE before
    CREATE TEMPORARY FUNCTION.

    I'm on the hive-0.3.0-hadoop-0.18.0-bin release.

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com


    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com
  • Saurabh Nanda at Jul 17, 2009 at 6:31 am
    the release is quite old, we implemented "add jar" after this release.



    Should I just compile Hive directly from
    http://svn.apache.org/repos/asf/hadoop/hive/trunk/ ? Is it stable enough?

    Saurabh.
  • Ashish Thusoo at Jul 17, 2009 at 7:23 pm
    You should try it. Eva though mentioned that there was something wrong with group by and joins in the trunk but we should be able to figure that out if that is a problem soon. We have already deployed the trunk to our adhoc users within FB so it should be stable enough.

    Ashish

    ________________________________
    From: Saurabh Nanda
    Sent: Thursday, July 16, 2009 11:33 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Creating a UDF (was Hive SerDe?)



    the release is quite old, we implemented "add jar" after this release.


    Should I just compile Hive directly from http://svn.apache.org/repos/asf/hadoop/hive/trunk/ ? Is it stable enough?

    Saurabh.
  • Saurabh Nanda at Jul 18, 2009 at 1:09 pm
    Any chance of making a binary Hive release with the latest features?

    Saurabh.
    On Sat, Jul 18, 2009 at 12:53 AM, Ashish Thusoo wrote:

    You should try it. Eva though mentioned that there was something wrong
    with group by and joins in the trunk but we should be able to figure that
    out if that is a problem soon. We have already deployed the trunk to our
    adhoc users within FB so it should be stable enough.

    Ashish

    ------------------------------
    *From:* Saurabh Nanda
    *Sent:* Thursday, July 16, 2009 11:33 PM
    *To:* hive-user@hadoop.apache.org
    *Subject:* Re: Creating a UDF (was Hive SerDe?)



    the release is quite old, we implemented "add jar" after this release.



    Should I just compile Hive directly from
    http://svn.apache.org/repos/asf/hadoop/hive/trunk/ ? Is it stable enough?

    Saurabh.
    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com


    --
    http://nandz.blogspot.com
    http://foodieforlife.blogspot.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 14, '09 at 8:09a
activeJul 18, '09 at 1:09p
posts15
users6
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase