Grokbase Groups Hive user March 2011
FAQ
Hi Experts
I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with
SQL NOT IN functionality. My SQL query would like this

INSERT INTO Table1
(T1_field1,T1_field2_ID,T1_field3,T1_field4)
SELECT field1,field2,field3,field4
FROM Table2 JOIN Table3 T3
WHERE (field1,field2,field3,field4 ) NOT IN
(SELECT field1,field2,field3,field4 FROM Table4)
AND field4 BETWEEN T3.START_DATE AND T3.END_DATE
AND T3.field3 = 'xyz' GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;


I'm comfortable with the rest of the query apart from the NOT IN part. I'm using
Hive 0.7 with CDH3B4. I check the availability of SQL IN functionality in my
hive and it worked like a charm but NOT IN doesn't work out here. Has it been
already included as a new patch in latest trunk?
Can some one help me out how i can re frame the query avoiding the NOT IN part
to get it working on Hive?(I'm not really from a db background)
Also is there any JIRA tickets open to have this functionality supported in
upcoming versions of hive?

Regards
Bejoy.K.S

Search Discussions

  • Rekha Joshi at Mar 8, 2011 at 9:40 am
    @Bejoy - AFAIK, NOT IN is not directly supported in current hive. Workaround, you can write an outer join instead of antijoin. HIVE-1740, mentions another workaround - NOT(x LIKE p).
    Thanks,
    Rekha.

    On 3/8/11 2:12 PM, "Bejoy Ks" wrote:

    Hi Experts
    I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with SQL NOT IN functionality. My SQL query would like this

    INSERT INTO Table1
    (T1_field1,T1_field2_ID,T1_field3,T1_field4)
    SELECT field1,field2,field3,field4
    FROM Table2 JOIN Table3 T3
    WHERE (field1,field2,field3,field4 ) NOT IN
    (SELECT field1,field2,field3,field4 FROM Table4)
    AND field4 BETWEEN T3.START_DATE AND T3.END_DATE
    AND T3.field3 = 'xyz' GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;


    I'm comfortable with the rest of the query apart from the NOT IN part. I'm using Hive 0.7 with CDH3B4. I check the availability of SQL IN functionality in my hive and it worked like a charm but NOT IN doesn't work out here. Has it been already included as a new patch in latest trunk?
    Can some one help me out how i can re frame the query avoiding the NOT IN part to get it working on Hive?(I'm not really from a db background)
    Also is there any JIRA tickets open to have this functionality supported in upcoming versions of hive?

    Regards
    Bejoy.K.S
  • Bejoy Ks at Mar 8, 2011 at 10:58 am
    Thanks Rekha. I went with your first option 'LEFT OUTER JOIN' and it worked like
    a charm.
    The second one was not fitting for my case as it was popping out parse errors
    due to multiple columns separated by comma coming under the same NOT IN clause
    {(field1,field2,field3,field4 )NOT IN (SELECT field1,field2,field3,field4 ...}
    Guess hive doesn't support queries that way
    Now there is minor hurdle still, The SQL BETWEEN. How can we get this BETWEEN
    AND supported in Hive QL?
    Any thoughts?

    My new transformed query would look like this
    INSERT OVERWRITE TABLE Table1
    SELECT T2.field1,T2.field2,T2.field3,T2.field4
    FROM Table2 T2 JOIN Table3 RJC ON (RJC.field3 = 'xyz') LEFT OUTER JOIN Table4 T4
    ON

    (T2.field1 = T4.field1 AND T2.field2 = T4.field2 AND T2.field3 = T4.field3 AND
    T2.field4 = T4.field4)
    WHERE field4 BETWEEN RJC.START_DATE AND RJC.END_DATE
    GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;



    Regards
    Bejoy.K.S






    ________________________________
    From: Rekha Joshi <rekhajos@yahoo-inc.com>
    To: "user@hive.apache.org" <user@hive.apache.org>
    Sent: Tue, March 8, 2011 3:08:08 PM
    Subject: Re: How to support SQL NOT IN function in Hive QL

    Re: How to support SQL NOT IN function in Hive QL @Bejoy – AFAIK, NOT IN is not
    directly supported in current hive. Workaround, you can write an outer join
    instead of antijoin. HIVE-1740, mentions another workaround - NOT(x LIKE p).
    Thanks,
    Rekha.

    On 3/8/11 2:12 PM, "Bejoy Ks" wrote:


    Hi Experts
    I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with
    SQL NOT IN functionality. My SQL query would like this

    INSERT INTO Table1
    (T1_field1,T1_field2_ID,T1_field3,T1_field4)
    SELECT field1,field2,field3,field4
    FROM Table2 JOIN Table3 T3
    WHERE (field1,field2,field3,field4 ) NOT IN
    (SELECT field1,field2,field3,field4 FROM Table4)
    AND field4 BETWEEN T3.START_DATE AND T3.END_DATE
    AND T3.field3 = 'xyz' GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;


    I'm comfortable with the rest of the query apart from the NOT IN part. I'm using
    Hive 0.7 with CDH3B4. I check the availability of SQL IN functionality in my
    hive and it worked like a charm but NOT IN doesn't work out here. Has it been
    already included as a new patch in latest trunk?
    Can some one help me out how i can re frame the query avoiding the NOT IN part
    to get it working on Hive?(I'm not really from a db background)
    Also is there any JIRA tickets open to have this functionality supported in
    upcoming versions of hive?

    Regards
    Bejoy.K.S


  • Rekha Joshi at Mar 8, 2011 at 11:06 am
    Bejoy - If timestamp of the dates is not of much importance to you, you can alternatively use >= 'start_date' and <= 'end_date'.
    Thanks,
    Rekha.


    On 3/8/11 4:27 PM, "Bejoy Ks" wrote:

    Thanks Rekha. I went with your first option 'LEFT OUTER JOIN' and it worked like a charm.
    The second one was not fitting for my case as it was popping out parse errors due to multiple columns separated by comma coming under the same NOT IN clause {(field1,field2,field3,field4 )NOT IN (SELECT field1,field2,field3,field4 ...}
    Guess hive doesn't support queries that way
    Now there is minor hurdle still, The SQL BETWEEN. How can we get this BETWEEN AND supported in Hive QL?
    Any thoughts?

    My new transformed query would look like this
    INSERT OVERWRITE TABLE Table1
    SELECT T2.field1,T2.field2,T2.field3,T2.field4
    FROM Table2 T2 JOIN Table3 RJC ON (RJC.field3 = 'xyz') LEFT OUTER JOIN Table4 T4 ON
    (T2.field1 = T4.field1 AND T2.field2 = T4.field2 AND T2.field3 = T4.field3 AND T2.field4 = T4.field4)
    WHERE field4 BETWEEN RJC.START_DATE AND RJC.END_DATE
    GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;



    Regards
    Bejoy.K.S



    ________________________________
    From: Rekha Joshi <rekhajos@yahoo-inc.com>
    To: "user@hive.apache.org" <user@hive.apache.org>
    Sent: Tue, March 8, 2011 3:08:08 PM
    Subject: Re: How to support SQL NOT IN function in Hive QL

    Re: How to support SQL NOT IN function in Hive QL @Bejoy - AFAIK, NOT IN is not directly supported in current hive. Workaround, you can write an outer join instead of antijoin. HIVE-1740, mentions another workaround - NOT(x LIKE p).
    Thanks,
    Rekha.

    On 3/8/11 2:12 PM, "Bejoy Ks" wrote:

    Hi Experts
    I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with SQL NOT IN functionality. My SQL query would like this

    INSERT INTO Table1
    (T1_field1,T1_field2_ID,T1_field3,T1_field4)
    SELECT field1,field2,field3,field4
    FROM Table2 JOIN Table3 T3
    WHERE (field1,field2,field3,field4 ) NOT IN
    (SELECT field1,field2,field3,field4 FROM Table4)
    AND field4 BETWEEN T3.START_DATE AND T3.END_DATE
    AND T3.field3 = 'xyz' GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;


    I'm comfortable with the rest of the query apart from the NOT IN part. I'm using Hive 0.7 with CDH3B4. I check the availability of SQL IN functionality in my hive and it worked like a charm but NOT IN doesn't work out here. Has it been already included as a new patch in latest trunk?
    Can some one help me out how i can re frame the query avoiding the NOT IN part to get it working on Hive?(I'm not really from a db background)
    Also is there any JIRA tickets open to have this functionality supported in upcoming versions of hive?

    Regards
    Bejoy.K.S
  • Bejoy Ks at Mar 8, 2011 at 11:53 am
    Thanks Rekha for such a quick response. A few more doubts out here
    If I use the comparison operators on the dates directly would they give a
    desired result ?,as the dates are stored in Hive tables as String
    Also in the comparison of dates if we use the unix_timestamp() it would consider
    the time stamp as well along with date for comparison right?

    unix_timestamp(field4) >= unix_timestamp(RJC.START_DATE) AND
    unix_timestamp(field4) <= unix_timestamp(RJC.END_DATE)


    Regards
    Bejoy.K.S





    ________________________________
    From: Rekha Joshi <rekhajos@yahoo-inc.com>
    To: "user@hive.apache.org" <user@hive.apache.org>
    Sent: Tue, March 8, 2011 4:36:08 PM
    Subject: Re: How to support SQL NOT IN function in Hive QL

    Re: How to support SQL NOT IN function in Hive QL Bejoy – If timestamp of the
    dates is not of much importance to you, you can alternatively use >=
    ‘start_date’ and <= ‘end_date’.
    Thanks,
    Rekha.


    On 3/8/11 4:27 PM, "Bejoy Ks" wrote:


    Thanks Rekha. I went with your first option 'LEFT OUTER JOIN' and it worked like
    a charm.
    The second one was not fitting for my case as it was popping out parse errors
    due to multiple columns separated by comma coming under the same NOT IN clause
    {(field1,field2,field3,field4 )NOT IN (SELECT field1,field2,field3,field4 ...}
    Guess hive doesn't support queries that way
    Now there is minor hurdle still, The SQL BETWEEN. How can we get this BETWEEN
    AND supported in Hive QL?
    Any thoughts?

    My new transformed query would look like this
    INSERT OVERWRITE TABLE Table1
    SELECT T2.field1,T2.field2,T2.field3,T2.field4
    FROM Table2 T2 JOIN Table3 RJC ON (RJC.field3 = 'xyz') LEFT OUTER JOIN Table4 T4
    ON

    (T2.field1 = T4.field1 AND T2.field2 = T4.field2 AND T2.field3 = T4.field3 AND
    T2.field4 = T4.field4)
    WHERE field4 BETWEEN RJC.START_DATE AND RJC.END_DATE
    GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;



    Regards
    Bejoy.K.S


    ________________________________
    From:Rekha Joshi <rekhajos@yahoo-inc.com>
    To: "user@hive.apache.org" <user@hive.apache.org>
    Sent: Tue, March 8, 2011 3:08:08 PM
    Subject: Re: How to support SQL NOT IN function in Hive QL

    Re: How to support SQL NOT IN function in Hive QL @Bejoy – AFAIK, NOT IN is not
    directly supported in current hive. Workaround, you can write an outer join
    instead of antijoin. HIVE-1740, mentions another workaround - NOT(x LIKE p).
    Thanks,
    Rekha.

    On 3/8/11 2:12 PM, "Bejoy Ks" wrote:


    Hi Experts
    I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with
    SQL NOT IN functionality. My SQL query would like this

    INSERT INTO Table1
    (T1_field1,T1_field2_ID,T1_field3,T1_field4)
    SELECT field1,field2,field3,field4
    FROM Table2 JOIN Table3 T3
    WHERE (field1,field2,field3,field4 ) NOT IN
    (SELECT field1,field2,field3,field4 FROM Table4)
    AND field4 BETWEEN T3.START_DATE AND T3.END_DATE
    AND T3.field3 = 'xyz' GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;


    I'm comfortable with the rest of the query apart from the NOT IN part. I'm using
    Hive 0.7 with CDH3B4. I check the availability of SQL IN functionality in my
    hive and it worked like a charm but NOT IN doesn't work out here. Has it been
    already included as a new patch in latest trunk?
    Can some one help me out how i can re frame the query avoiding the NOT IN part
    to get it working on Hive?(I'm not really from a db background)
    Also is there any JIRA tickets open to have this functionality supported in
    upcoming versions of hive?

    Regards
    Bejoy.K.S


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 8, '11 at 8:43a
activeMar 8, '11 at 11:53a
posts5
users2
websitehive.apache.org

2 users in discussion

Bejoy Ks: 3 posts Rekha Joshi: 2 posts

People

Translate

site design / logo © 2022 Grokbase