FAQ
Hello,

I'm rather new to Hive and have been playing with it the last couple weeks
to see if it is appropriate to use for a particular project inside where I
work. My essential question is, how to maintain data integrity inside the
tables so that we don't accidentally load duplicate data. Normally we rely
on indexes or unique keys to enforce this. Is there a general strategy for
this in Hive?

In a second question, I haven't seen anything like it in the docs, but is
there any equivalent to CASE,DECODE, or IF-THEN-ELSE allowed in the query?

Thanks!

-Shane P. Brady

Search Discussions

  • Zheng Shao at Jan 29, 2009 at 6:04 pm
    IF is just added lasy evening.

    I will add it to wiki today.

    We don't have case, decode etc yet.


    Zheng


    On 1/29/09, Shane Brady wrote:
    Hello,

    I'm rather new to Hive and have been playing with it the last couple weeks
    to see if it is appropriate to use for a particular project inside where I
    work. My essential question is, how to maintain data integrity inside the
    tables so that we don't accidentally load duplicate data. Normally we rely
    on indexes or unique keys to enforce this. Is there a general strategy for
    this in Hive?

    In a second question, I haven't seen anything like it in the docs, but is
    there any equivalent to CASE,DECODE, or IF-THEN-ELSE allowed in the query?

    Thanks!

    -Shane P. Brady
    --
    Sent from Gmail for mobile | mobile.google.com

    Yours,
    Zheng
  • Jeff Hammerbacher at Jan 29, 2009 at 7:50 pm
    Hey Shane,

    One possibility would be to run a MapReduce/Hive job after the load that
    checks that your integrity constraints are met.

    Regards,
    Jeff
    On Thu, Jan 29, 2009 at 10:03 AM, Zheng Shao wrote:

    IF is just added lasy evening.

    I will add it to wiki today.

    We don't have case, decode etc yet.


    Zheng


    On 1/29/09, Shane Brady wrote:
    Hello,

    I'm rather new to Hive and have been playing with it the last couple weeks
    to see if it is appropriate to use for a particular project inside where I
    work. My essential question is, how to maintain data integrity inside the
    tables so that we don't accidentally load duplicate data. Normally we rely
    on indexes or unique keys to enforce this. Is there a general strategy for
    this in Hive?

    In a second question, I haven't seen anything like it in the docs, but is
    there any equivalent to CASE,DECODE, or IF-THEN-ELSE allowed in the query?
    Thanks!

    -Shane P. Brady
    --
    Sent from Gmail for mobile | mobile.google.com

    Yours,
    Zheng
  • Prasad Chakka at Jan 29, 2009 at 8:04 pm
    Normally in hive, a table or partition is loaded by a single job/process at once. Once it is loaded you can't append or insert any more data into that table (only if you do it manually by moving data to that directory) So you can most probably easier to enforce the constraints in that loading process. This solution not a nifty as RDBMS but the the probability of inserting duplicates is much higher in RDBMS though.


    ________________________________
    From: Jeff Hammerbacher <hammer@cloudera.com>
    Reply-To: <hive-user@hadoop.apache.org>
    Date: Thu, 29 Jan 2009 11:49:29 -0800
    To: <hive-user@hadoop.apache.org>
    Cc: Zheng Shao <zshao@facebook.com>
    Subject: Re: data integrity

    Hey Shane,

    One possibility would be to run a MapReduce/Hive job after the load that checks that your integrity constraints are met.

    Regards,
    Jeff

    On Thu, Jan 29, 2009 at 10:03 AM, Zheng Shao wrote:
    IF is just added lasy evening.

    I will add it to wiki today.

    We don't have case, decode etc yet.


    Zheng


    On 1/29/09, Shane Brady wrote:
    Hello,

    I'm rather new to Hive and have been playing with it the last couple weeks
    to see if it is appropriate to use for a particular project inside where I
    work. My essential question is, how to maintain data integrity inside the
    tables so that we don't accidentally load duplicate data. Normally we rely
    on indexes or unique keys to enforce this. Is there a general strategy for
    this in Hive?

    In a second question, I haven't seen anything like it in the docs, but is
    there any equivalent to CASE,DECODE, or IF-THEN-ELSE allowed in the query?

    Thanks!

    -Shane P. Brady
    --
    Sent from Gmail for mobile | mobile.google.com <http://mobile.google.com>

    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJan 29, '09 at 5:35p
activeJan 29, '09 at 8:04p
posts4
users4
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase