Grokbase Groups Hive user March 2011
FAQ
Hey all,

We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries, and
I'm having problems dynamically adding functions for use in the thrift
server.

I want to add a jar, add a function, then execute a query.

Using ruby as the example, I've tried:

Hive.connect(@url, @port) do |connection|
connection.execute(<ADD JAR and FUNCTION>)
results = connection.fetch(query)
end
but the function is not available between calls.

So I tried prepending the query with the function creation calls, but then I
don't get any data back from hive (simply an empty array).

Could someone direct me to the best way to add functions for thrift queries?
Honestly I'd rather add them permanently on startup, but I can't find a way
to do that either.

Search Discussions

  • Matthew Rathbone at Mar 28, 2011 at 2:54 pm
    Hey guys,

    I could really do with some expert-hive help on my issue, my hive-expertise
    are not all that great.

    I'm using hive 0.7 with hadoop 0.20

    A simple way to describe my problem is this:

    Using thrift, if you execute the following sequence:
    thrift.execute("ADD JAR /udf.jar");
    thrift.execute("create temporary function function1 as
    'org.apache.test.Function' ")

    then the second execute doesn't see the jar.

    But if I try to string them together:
    thrift.execute("ADD JAR /udf.jar ; create temporary function function1 as
    'org.apache.test.Function1' ")

    then hive throws errors:
    11/03/28 14:51:07 INFO SessionState: Added resource:
    /mnt/var/lib/hive_07/downloaded_resources/udf.jar
    ; does not exist
    11/03/28 14:51:07 ERROR SessionState: ; does not exist
    create does not exist
    11/03/28 14:51:07 ERROR SessionState: create does not exist
    temporary does not exist
    11/03/28 14:51:07 ERROR SessionState: temporary does not exist
    function does not exist
    11/03/28 14:51:07 ERROR SessionState: function does not exist



    Does anyone have a suggestion on how to string these together (along with a
    select statement afterwards)

    Thanks for any help,

    Matthew


    On Thu, Mar 24, 2011 at 4:36 PM, Matthew Rathbone wrote:

    Hey all,

    We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries,
    and I'm having problems dynamically adding functions for use in the thrift
    server.

    I want to add a jar, add a function, then execute a query.

    Using ruby as the example, I've tried:

    Hive.connect(@url, @port) do |connection|
    connection.execute(<ADD JAR and FUNCTION>)
    results = connection.fetch(query)
    end
    but the function is not available between calls.

    So I tried prepending the query with the function creation calls, but then
    I don't get any data back from hive (simply an empty array).

    Could someone direct me to the best way to add functions for thrift
    queries? Honestly I'd rather add them permanently on startup, but I can't
    find a way to do that either.


    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma <http://twitter.com/rathboma> |
    4sq<http://foursquare.com/rathboma>
  • Edward Capriolo at Mar 28, 2011 at 2:58 pm

    On Mon, Mar 28, 2011 at 10:53 AM, Matthew Rathbone wrote:
    Hey guys,
    I could really do with some expert-hive help on my issue, my hive-expertise
    are not all that great.
    I'm using hive 0.7 with hadoop 0.20
    A simple way to describe my problem is this:
    Using thrift, if you execute the following sequence:
    thrift.execute("ADD JAR /udf.jar");
    thrift.execute("create temporary function function1 as
    'org.apache.test.Function' ")
    then the second execute doesn't see the jar.
    But if I try to string them together:
    thrift.execute("ADD JAR /udf.jar ; create temporary function function1 as
    'org.apache.test.Function1' ")
    then hive throws errors:
    11/03/28 14:51:07 INFO SessionState: Added resource:
    /mnt/var/lib/hive_07/downloaded_resources/udf.jar
    ; does not exist
    11/03/28 14:51:07 ERROR SessionState: ; does not exist
    create does not exist
    11/03/28 14:51:07 ERROR SessionState: create does not exist
    temporary does not exist
    11/03/28 14:51:07 ERROR SessionState: temporary does not exist
    function does not exist
    11/03/28 14:51:07 ERROR SessionState: function does not exist


    Does anyone have a suggestion on how to string these together (along with a
    select statement afterwards)
    Thanks for any help,
    Matthew

    On Thu, Mar 24, 2011 at 4:36 PM, Matthew Rathbone wrote:

    Hey all,
    We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries,
    and I'm having problems dynamically adding functions for use in the thrift
    server.
    I want to add a jar, add a function, then execute a query.
    Using ruby as the example, I've tried:
    Hive.connect(@url, @port) do |connection|
    connection.execute(<ADD JAR and FUNCTION>)
    results = connection.fetch(query)
    end
    but the function is not available between calls.
    So I tried prepending the query with the function creation calls, but then
    I don't get any data back from hive (simply an empty array).
    Could someone direct me to the best way to add functions for thrift
    queries? Honestly I'd rather add them permanently on startup, but I can't
    find a way to do that either.

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq
    Traditionally 'add jar' would look for the jar file to be on the
    thrift servers local file system not the client. I believe their is a
    0.7.0 patch to load UDF jars from HDFS so this might help.
  • Matthew Rathbone at Mar 28, 2011 at 3:18 pm
    Hey, thanks for the response.

    I have the jar on the thrift server's local file system (its the same
    machine as is running hive) and it's this path I pass to the add jar
    command.
    If I tail the logs I can see that the ADD JAR command is successful (when
    loading from local fs), but the subsequent execution of the create function
    statement still doesn't see the class:

    Added /mnt/var/lib/hive_07/downloaded_resources/udf.jar to class path
    11/03/28 15:14:10 INFO exec.FunctionTask: create function:
    java.lang.ClassNotFoundException: com.example.udf.Function1

    Do you know if the state gets reset between executes?
    On Mon, Mar 28, 2011 at 10:57 AM, Edward Capriolo wrote:

    On Mon, Mar 28, 2011 at 10:53 AM, Matthew Rathbone
    wrote:
    Hey guys,
    I could really do with some expert-hive help on my issue, my
    hive-expertise
    are not all that great.
    I'm using hive 0.7 with hadoop 0.20
    A simple way to describe my problem is this:
    Using thrift, if you execute the following sequence:
    thrift.execute("ADD JAR /udf.jar");
    thrift.execute("create temporary function function1 as
    'org.apache.test.Function' ")
    then the second execute doesn't see the jar.
    But if I try to string them together:
    thrift.execute("ADD JAR /udf.jar ; create temporary function function1 as
    'org.apache.test.Function1' ")
    then hive throws errors:
    11/03/28 14:51:07 INFO SessionState: Added resource:
    /mnt/var/lib/hive_07/downloaded_resources/udf.jar
    ; does not exist
    11/03/28 14:51:07 ERROR SessionState: ; does not exist
    create does not exist
    11/03/28 14:51:07 ERROR SessionState: create does not exist
    temporary does not exist
    11/03/28 14:51:07 ERROR SessionState: temporary does not exist
    function does not exist
    11/03/28 14:51:07 ERROR SessionState: function does not exist


    Does anyone have a suggestion on how to string these together (along with a
    select statement afterwards)
    Thanks for any help,
    Matthew


    On Thu, Mar 24, 2011 at 4:36 PM, Matthew Rathbone <
    matthew@foursquare.com>
    wrote:
    Hey all,
    We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries,
    and I'm having problems dynamically adding functions for use in the
    thrift
    server.
    I want to add a jar, add a function, then execute a query.
    Using ruby as the example, I've tried:
    Hive.connect(@url, @port) do |connection|
    connection.execute(<ADD JAR and FUNCTION>)
    results = connection.fetch(query)
    end
    but the function is not available between calls.
    So I tried prepending the query with the function creation calls, but
    then
    I don't get any data back from hive (simply an empty array).
    Could someone direct me to the best way to add functions for thrift
    queries? Honestly I'd rather add them permanently on startup, but I
    can't
    find a way to do that either.

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq
    Traditionally 'add jar' would look for the jar file to be on the
    thrift servers local file system not the client. I believe their is a
    0.7.0 patch to load UDF jars from HDFS so this might help.


    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma <http://twitter.com/rathboma> |
    4sq<http://foursquare.com/rathboma>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 24, '11 at 8:36p
activeMar 28, '11 at 3:18p
posts4
users2
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase