FAQ
Hi,

I have a "Could not find symbol" error when I import an UDA function in
Impala 1.2.1, Cloudera Manager 4.8.0. I use the functions
from https://github.com/cloudera/impala-udf-samples and the (excellent) doc
at
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_udf.html?scroll=udf_building_unique_1.

[localhost.localdomain:21000] > create aggregate function average(double)
returns double location '/user/hive/udfs/libudasample.so'
init_fn='AvgInit' update_fn='AvgUpdate' merge_fn='AvgMerge'
finalize_fn='AvgFinalize';
Query: create aggregate function average(double) returns double
location '/user/hive/udfs/libudasample.so' init_fn='AvgInit'
update_fn='AvgUpdate' merge_fn='AvgMerge' finalize_fn='AvgFinalize'
ERROR: AnalysisException: Could not find symbol 'AvgUpdate' in:
/user/hive/udfs/libudasample.so
Could not find symbol.

I don't think it's a configuration problem, I had no problem compiling and
importing an UDF function.

[localhost.localdomain:21000] > create function has_vowels (string) returns
boolean location '/user/hive/udfs/libudfsample.so' symbol='HasVowels';
Query: create function has_vowels (string) returns boolean location
'/user/hive/udfs/libudfsample.so' symbol='HasVowels'
[localhost.localdomain:21000] > select has_vowels(food) from
food_data_parquet;
Query: select has_vowels(food) from food_data_parquet
Query finished, fetching results ...
+------------------------------+
food_colors.has_vowels(food) |
+------------------------------+
true |
true |
true |
true |
true |
true |
+------------------------------+
Returned 6 row(s) in 0.63s

Does anyone have the same problem ?

-- Nicolas

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • Skye Wanderman-Milne at Dec 12, 2013 at 2:28 am
    Hi Nicolas,

    Unfortunately the average UDA is not yet supported, and won't be until
    Impala 2.0. The reason is that the UDA requires a separate intermediate
    type to store the sum and count of the input values, which are divided at
    the end to compute the average. Currently we only support UDAs where the
    intermediate type is the same as the return type, such as Count or
    StringConcat. If you look at
    https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.h,
    you can see that the Avg functions use a BufferVal intermediate type, but
    returns a DoubleVal.

    You're getting that error message because Impala is trying to find a symbol
    for an AvgUpdate function that has a DoubleVal intermediate type to match
    the UDA return type, which of course doesn't exist. We've updated the error
    message in Impala 1.2.2 to give more information about exactly what
    function Impala is looking for, so you can better determine what's going
    wrong in these situations.

    I apologize that this is not clearly documented. We're in the process of
    cleaning up our UDA examples so they'll better reflect what's actually
    available.

    Skye

    On Wed, Dec 11, 2013 at 2:41 PM, Nicolas Fouché wrote:

    Hi,

    I have a "Could not find symbol" error when I import an UDA function in
    Impala 1.2.1, Cloudera Manager 4.8.0. I use the functions from
    https://github.com/cloudera/impala-udf-samples and the (excellent) doc at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_udf.html?scroll=udf_building_unique_1
    .

    [localhost.localdomain:21000] > create aggregate function average(double)
    returns double location '/user/hive/udfs/libudasample.so'
    init_fn='AvgInit' update_fn='AvgUpdate' merge_fn='AvgMerge'
    finalize_fn='AvgFinalize';
    Query: create aggregate function average(double) returns double
    location '/user/hive/udfs/libudasample.so' init_fn='AvgInit'
    update_fn='AvgUpdate' merge_fn='AvgMerge' finalize_fn='AvgFinalize'
    ERROR: AnalysisException: Could not find symbol 'AvgUpdate' in:
    /user/hive/udfs/libudasample.so
    Could not find symbol.

    I don't think it's a configuration problem, I had no problem compiling and
    importing an UDF function.

    [localhost.localdomain:21000] > create function has_vowels (string)
    returns boolean location '/user/hive/udfs/libudfsample.so'
    symbol='HasVowels';
    Query: create function has_vowels (string) returns boolean location
    '/user/hive/udfs/libudfsample.so' symbol='HasVowels'
    [localhost.localdomain:21000] > select has_vowels(food) from
    food_data_parquet;
    Query: select has_vowels(food) from food_data_parquet
    Query finished, fetching results ...
    +------------------------------+
    food_colors.has_vowels(food) |
    +------------------------------+
    true |
    true |
    true |
    true |
    true |
    true |
    +------------------------------+
    Returned 6 row(s) in 0.63s

    Does anyone have the same problem ?

    -- Nicolas

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Nicolas Fouché at Dec 12, 2013 at 10:58 am
    Hi Skye,

    ok, got it, thanks for the clear answer. So while waiting for Impala 2.0,
    could I consider using a StringVal containing a "serialized" struct ? That
    would certainely be a bit expensive serializing and deserializing the
    string each time. But when Impala 2.0 is ready, we would just have to
    remove the (de)serialization process.

    -- Nicolas

    Le jeudi 12 décembre 2013 03:27:58 UTC+1, Skye Wanderman-Milne a écrit :
    Hi Nicolas,

    Unfortunately the average UDA is not yet supported, and won't be until
    Impala 2.0. The reason is that the UDA requires a separate intermediate
    type to store the sum and count of the input values, which are divided at
    the end to compute the average. Currently we only support UDAs where the
    intermediate type is the same as the return type, such as Count or
    StringConcat. If you look at
    https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.h,
    you can see that the Avg functions use a BufferVal intermediate type, but
    returns a DoubleVal.

    You're getting that error message because Impala is trying to find a
    symbol for an AvgUpdate function that has a DoubleVal intermediate type to
    match the UDA return type, which of course doesn't exist. We've updated the
    error message in Impala 1.2.2 to give more information about exactly what
    function Impala is looking for, so you can better determine what's going
    wrong in these situations.

    I apologize that this is not clearly documented. We're in the process of
    cleaning up our UDA examples so they'll better reflect what's actually
    available.

    Skye


    On Wed, Dec 11, 2013 at 2:41 PM, Nicolas Fouché <nicolas...@gmail.com<javascript:>
    wrote:
    Hi,

    I have a "Could not find symbol" error when I import an UDA function in
    Impala 1.2.1, Cloudera Manager 4.8.0. I use the functions from
    https://github.com/cloudera/impala-udf-samples and the (excellent) doc
    at
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_udf.html?scroll=udf_building_unique_1
    .

    [localhost.localdomain:21000] > create aggregate function average(double)
    returns double location '/user/hive/udfs/libudasample.so'
    init_fn='AvgInit' update_fn='AvgUpdate' merge_fn='AvgMerge'
    finalize_fn='AvgFinalize';
    Query: create aggregate function average(double) returns double
    location '/user/hive/udfs/libudasample.so' init_fn='AvgInit'
    update_fn='AvgUpdate' merge_fn='AvgMerge' finalize_fn='AvgFinalize'
    ERROR: AnalysisException: Could not find symbol 'AvgUpdate' in:
    /user/hive/udfs/libudasample.so
    Could not find symbol.

    I don't think it's a configuration problem, I had no problem compiling
    and importing an UDF function.

    [localhost.localdomain:21000] > create function has_vowels (string)
    returns boolean location '/user/hive/udfs/libudfsample.so'
    symbol='HasVowels';
    Query: create function has_vowels (string) returns boolean location
    '/user/hive/udfs/libudfsample.so' symbol='HasVowels'
    [localhost.localdomain:21000] > select has_vowels(food) from
    food_data_parquet;
    Query: select has_vowels(food) from food_data_parquet
    Query finished, fetching results ...
    +------------------------------+
    food_colors.has_vowels(food) |
    +------------------------------+
    true |
    true |
    true |
    true |
    true |
    true |
    +------------------------------+
    Returned 6 row(s) in 0.63s

    Does anyone have the same problem ?

    -- Nicolas

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Skye Wanderman-Milne at Dec 13, 2013 at 1:04 am
    Yes, that's the workaround I would recommend for now. I'm going to update
    the sample UDAs to serialize to StringVals, which can then be deserialized
    using a cast (I'll send an email to the list after updating the repo).

    The serialization/deserialization shouldn't be too expensive since it's
    only done for the finalize function, i.e. once per output row. So unless
    you have a large number of output rows, e.g. if you're doing a group by
    over a large table and each group contains only a few input rows, the
    serialization cost should be negligible.

    Skye

    On Thu, Dec 12, 2013 at 2:58 AM, Nicolas Fouché wrote:

    Hi Skye,

    ok, got it, thanks for the clear answer. So while waiting for Impala 2.0,
    could I consider using a StringVal containing a "serialized" struct ? That
    would certainely be a bit expensive serializing and deserializing the
    string each time. But when Impala 2.0 is ready, we would just have to
    remove the (de)serialization process.

    -- Nicolas

    Le jeudi 12 décembre 2013 03:27:58 UTC+1, Skye Wanderman-Milne a écrit :
    Hi Nicolas,

    Unfortunately the average UDA is not yet supported, and won't be until
    Impala 2.0. The reason is that the UDA requires a separate intermediate
    type to store the sum and count of the input values, which are divided at
    the end to compute the average. Currently we only support UDAs where the
    intermediate type is the same as the return type, such as Count or
    StringConcat. If you look at https://github.com/cloudera/
    impala-udf-samples/blob/master/uda-sample.h, you can see that the Avg
    functions use a BufferVal intermediate type, but returns a DoubleVal.

    You're getting that error message because Impala is trying to find a
    symbol for an AvgUpdate function that has a DoubleVal intermediate type to
    match the UDA return type, which of course doesn't exist. We've updated the
    error message in Impala 1.2.2 to give more information about exactly what
    function Impala is looking for, so you can better determine what's going
    wrong in these situations.

    I apologize that this is not clearly documented. We're in the process of
    cleaning up our UDA examples so they'll better reflect what's actually
    available.

    Skye

    On Wed, Dec 11, 2013 at 2:41 PM, Nicolas Fouché wrote:

    Hi,

    I have a "Could not find symbol" error when I import an UDA function in
    Impala 1.2.1, Cloudera Manager 4.8.0. I use the functions from
    https://github.com/cloudera/impala-udf-samples and the (excellent) doc
    at http://www.cloudera.com/content/cloudera-content/
    cloudera-docs/Impala/latest/Installing-and-Using-Impala/
    ciiu_udf.html?scroll=udf_building_unique_1.

    [localhost.localdomain:21000] > create aggregate function
    average(double) returns double location '/user/hive/udfs/libudasample.so'
    init_fn='AvgInit' update_fn='AvgUpdate' merge_fn='AvgMerge'
    finalize_fn='AvgFinalize';
    Query: create aggregate function average(double) returns double
    location '/user/hive/udfs/libudasample.so' init_fn='AvgInit'
    update_fn='AvgUpdate' merge_fn='AvgMerge' finalize_fn='AvgFinalize'
    ERROR: AnalysisException: Could not find symbol 'AvgUpdate' in:
    /user/hive/udfs/libudasample.so
    Could not find symbol.

    I don't think it's a configuration problem, I had no problem compiling
    and importing an UDF function.

    [localhost.localdomain:21000] > create function has_vowels (string)
    returns boolean location '/user/hive/udfs/libudfsample.so'
    symbol='HasVowels';
    Query: create function has_vowels (string) returns boolean location
    '/user/hive/udfs/libudfsample.so' symbol='HasVowels'
    [localhost.localdomain:21000] > select has_vowels(food) from
    food_data_parquet;
    Query: select has_vowels(food) from food_data_parquet
    Query finished, fetching results ...
    +------------------------------+
    food_colors.has_vowels(food) |
    +------------------------------+
    true |
    true |
    true |
    true |
    true |
    true |
    +------------------------------+
    Returned 6 row(s) in 0.63s

    Does anyone have the same problem ?

    -- Nicolas

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedDec 11, '13 at 10:41p
activeDec 13, '13 at 1:04a
posts4
users2
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase