Grokbase Groups Hive user May 2011
FAQ
I would like to implement some kind of assert functionality in Hive QL.

Here is how I do it in MySQL. I can assert that a given query returns no
(bad) rows by creating a table with one row containing '1' and a unique
index. Then, I try to insert into that table select 1 from (query). If the
query returns something, I have an assert failure. I've found such asserts
very helpful in catching data bugs early.

Can I do something similar in Hive? I suppose I could implement a UDF that
fails if it gets executed. This way I can ensure there is no records
matching a given query.
That doesn't sound too bad but perhaps there is a cleaner solution.

Thanks!

Search Discussions

  • Alex Kozlov at May 26, 2011 at 9:19 pm
    1) Would `select count(1) from (query)` do the same thing? I am a bit
    confused what is the semantic of assert: is it just no rows or some kind of
    syntax error check?
    2) Hive is not an OLTP and is not optimized for single row inserts (or
    updates for this matter). In a trivial implementation one would just do
    copy-on-write, i.e. overwrite the whole data file, or add a small file
    containing one or a few rows. I do not think indices have been implemented
    yet either.
    3) You can probably do what you want with HBase (if you explain what you
    want to do in more detail)

    On Thu, May 26, 2011 at 2:00 PM, Igor Tatarinov wrote:

    I would like to implement some kind of assert functionality in Hive QL.

    Here is how I do it in MySQL. I can assert that a given query returns no
    (bad) rows by creating a table with one row containing '1' and a unique
    index. Then, I try to insert into that table select 1 from (query). If the
    query returns something, I have an assert failure. I've found such asserts
    very helpful in catching data bugs early.

    Can I do something similar in Hive? I suppose I could implement a UDF that
    fails if it gets executed. This way I can ensure there is no records
    matching a given query.
    That doesn't sound too bad but perhaps there is a cleaner solution.

    Thanks!



  • Igor Tatarinov at May 26, 2011 at 9:46 pm
    Here is one example. I want to make sure I don't have negative prices in my
    data. I would like to write something like:

    assert_empty(select * from Prices where price <= 0);

    as part of my Hive script. I expect my job to fail if there are negative
    prices in the data.

    I understand there is high cost to doing this but some of our tables are not
    that big and any kind of data checks would help.

    On Thu, May 26, 2011 at 2:18 PM, Alex Kozlov wrote:

    1) Would `select count(1) from (query)` do the same thing? I am a bit
    confused what is the semantic of assert: is it just no rows or some kind of
    syntax error check?
    2) Hive is not an OLTP and is not optimized for single row inserts (or
    updates for this matter). In a trivial implementation one would just do
    copy-on-write, i.e. overwrite the whole data file, or add a small file
    containing one or a few rows. I do not think indices have been implemented
    yet either.
    3) You can probably do what you want with HBase (if you explain what you
    want to do in more detail)

    On Thu, May 26, 2011 at 2:00 PM, Igor Tatarinov wrote:

    I would like to implement some kind of assert functionality in Hive QL.

    Here is how I do it in MySQL. I can assert that a given query returns no
    (bad) rows by creating a table with one row containing '1' and a unique
    index. Then, I try to insert into that table select 1 from (query). If the
    query returns something, I have an assert failure. I've found such asserts
    very helpful in catching data bugs early.

    Can I do something similar in Hive? I suppose I could implement a UDF that
    fails if it gets executed. This way I can ensure there is no records
    matching a given query.
    That doesn't sound too bad but perhaps there is a cleaner solution.

    Thanks!



  • Edward Capriolo at May 26, 2011 at 9:56 pm
    You can write a UDF. If it throws an exception from the UDF that will end
    your hive job.
    On Thu, May 26, 2011 at 5:46 PM, Igor Tatarinov wrote:

    Here is one example. I want to make sure I don't have negative prices in my
    data. I would like to write something like:

    assert_empty(select * from Prices where price <= 0);

    as part of my Hive script. I expect my job to fail if there are negative
    prices in the data.

    I understand there is high cost to doing this but some of our tables are
    not that big and any kind of data checks would help.

    On Thu, May 26, 2011 at 2:18 PM, Alex Kozlov wrote:

    1) Would `select count(1) from (query)` do the same thing? I am a bit
    confused what is the semantic of assert: is it just no rows or some kind of
    syntax error check?
    2) Hive is not an OLTP and is not optimized for single row inserts (or
    updates for this matter). In a trivial implementation one would just do
    copy-on-write, i.e. overwrite the whole data file, or add a small file
    containing one or a few rows. I do not think indices have been implemented
    yet either.
    3) You can probably do what you want with HBase (if you explain what you
    want to do in more detail)

    On Thu, May 26, 2011 at 2:00 PM, Igor Tatarinov wrote:

    I would like to implement some kind of assert functionality in Hive QL.

    Here is how I do it in MySQL. I can assert that a given query returns no
    (bad) rows by creating a table with one row containing '1' and a unique
    index. Then, I try to insert into that table select 1 from (query). If the
    query returns something, I have an assert failure. I've found such asserts
    very helpful in catching data bugs early.

    Can I do something similar in Hive? I suppose I could implement a UDF
    that fails if it gets executed. This way I can ensure there is no records
    matching a given query.
    That doesn't sound too bad but perhaps there is a cleaner solution.

    Thanks!



Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 26, '11 at 9:00p
activeMay 26, '11 at 9:56p
posts4
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase