Grokbase Groups Hive user August 2010
FAQ
Hi - I've a following SQL query. What is the way to convert it into HIVE
runnable format -

Select a.id, count(b.url)
from a, b
where a.id = b.id
and (b.url like '%google.com%' or b.url like '%h.google.com%' or b.url like
'%bing%')
and a.exec_date = date(now()) ; ----- get today's data
group by a.id


*thanks, ronak*
*
*
*
*

Search Discussions

  • Jacob R Rideout at Aug 20, 2010 at 3:31 pm

    On Fri, Aug 20, 2010 at 8:37 AM, Ronak Bhatt wrote:
    Hi - I've a following SQL query. What is the way to convert it into HIVE
    runnable format -
    Select a.id, count(b.url)
    from a, b
    where a.id = b.id
    and (b.url  like '%google.com%' or b.url like '%h.google.com%' or b.url like
    '%bing%')
    and a.exec_date = date(now()) ;  ----- get today's data
    group by a.id

    thanks, ronak
    I think something like:

    SELECT a.id, COUNT(distinct b.url)
    FROM a
    JOIN b on a.id = b.id
    WHERE
    (b.url LIKE '%google.com%' OR b.url LIKE '%h.google.com%' OR
    b.url LIKE '%bing%')
    AND a.exec_date = from_unixtime(unix_timestamp())
    GROUP BY a.id;

    should work, but I didn't test it. You will probably need to format
    the data string to match your data - hive doesn't really have a date
    type as such - but instead has date manipulation functions for
    strings.

    Take a look at http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions


    Jacob Rideout
  • Ronak Bhatt at Aug 20, 2010 at 3:38 pm
    *I read somewhere that hive does NOT support OR condition....that's what I
    was wondering as to how others handle the situation.*
    *
    *


    On Fri, Aug 20, 2010 at 8:30 AM, Jacob R Rideout wrote:
    On Fri, Aug 20, 2010 at 8:37 AM, Ronak Bhatt wrote:
    Hi - I've a following SQL query. What is the way to convert it into HIVE
    runnable format -
    Select a.id, count(b.url)
    from a, b
    where a.id = b.id
    and (b.url like '%google.com%' or b.url like '%h.google.com%' or b.url like
    '%bing%')
    and a.exec_date = date(now()) ; ----- get today's data
    group by a.id

    thanks, ronak
    I think something like:

    SELECT a.id, COUNT(distinct b.url)
    FROM a
    JOIN b on a.id = b.id
    WHERE
    (b.url LIKE '%google.com%' OR b.url LIKE '%h.google.com%' OR
    b.url LIKE '%bing%')
    AND a.exec_date = from_unixtime(unix_timestamp())
    GROUP BY a.id;

    should work, but I didn't test it. You will probably need to format
    the data string to match your data - hive doesn't really have a date
    type as such - but instead has date manipulation functions for
    strings.

    Take a look at
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions


    Jacob Rideout
  • Alexey Kovyrin at Aug 20, 2010 at 3:51 pm
    It definitely does support it.
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Logical_Operators
    On Fri, Aug 20, 2010 at 11:37 AM, Ronak Bhatt wrote:
    I read somewhere that hive does NOT support OR condition....that's what I
    was wondering as to how others handle the situation.


    On Fri, Aug 20, 2010 at 8:30 AM, Jacob R Rideout wrote:
    On Fri, Aug 20, 2010 at 8:37 AM, Ronak Bhatt wrote:
    Hi - I've a following SQL query. What is the way to convert it into HIVE
    runnable format -
    Select a.id, count(b.url)
    from a, b
    where a.id = b.id
    and (b.url  like '%google.com%' or b.url like '%h.google.com%' or b.url
    like
    '%bing%')
    and a.exec_date = date(now()) ;  ----- get today's data
    group by a.id

    thanks, ronak
    I think something like:

    SELECT a.id, COUNT(distinct b.url)
    FROM a
    JOIN b on a.id = b.id
    WHERE
    (b.url  LIKE '%google.com%' OR b.url LIKE '%h.google.com%' OR
    b.url LIKE '%bing%')
    AND a.exec_date = from_unixtime(unix_timestamp())
    GROUP BY a.id;

    should work, but I didn't test it. You will probably need to format
    the data string to match your data - hive doesn't really have a date
    type as such - but instead has date manipulation functions for
    strings.

    Take a look at
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions


    Jacob Rideout


    --
    Alexey Kovyrin
    http://kovyrin.net/
  • Ronak Bhatt at Aug 20, 2010 at 3:58 pm
    This OR and AND are for use in select or condition..for example,

    select (a OR B) as end_result_trur_or_false
    from table
    where id > 20;

    What I'm looking for is,

    select sum(amount)
    from table
    where id > 20 and
    (a > 5 OR b < 10)

    as you can see, I want to use OR in where condition...I could possibly use
    UNION ALL to simulate the OR condition, but it is lot of writing
    (duplicating the code) and also not efficient from performance perspective
    as the table will be scanned multiple times for each select of the union...

    hope this helps to clarify what kind of OR I'm looking for....



    *thanks, ronak*
    *
    *
    *408 504 4847*
    *My Blog : http://ronakbaps.posterous.com*
    *
    *
    *
    *


    On Fri, Aug 20, 2010 at 8:50 AM, Alexey Kovyrin wrote:

    It definitely does support it.
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Logical_Operators
    On Fri, Aug 20, 2010 at 11:37 AM, Ronak Bhatt wrote:
    I read somewhere that hive does NOT support OR condition....that's what I
    was wondering as to how others handle the situation.



    On Fri, Aug 20, 2010 at 8:30 AM, Jacob R Rideout <
    apache@jacobrideout.net>
    wrote:
    On Fri, Aug 20, 2010 at 8:37 AM, Ronak Bhatt wrote:
    Hi - I've a following SQL query. What is the way to convert it into
    HIVE
    runnable format -
    Select a.id, count(b.url)
    from a, b
    where a.id = b.id
    and (b.url like '%google.com%' or b.url like '%h.google.com%' or
    b.url
    like
    '%bing%')
    and a.exec_date = date(now()) ; ----- get today's data
    group by a.id

    thanks, ronak
    I think something like:

    SELECT a.id, COUNT(distinct b.url)
    FROM a
    JOIN b on a.id = b.id
    WHERE
    (b.url LIKE '%google.com%' OR b.url LIKE '%h.google.com%' OR
    b.url LIKE '%bing%')
    AND a.exec_date = from_unixtime(unix_timestamp())
    GROUP BY a.id;

    should work, but I didn't test it. You will probably need to format
    the data string to match your data - hive doesn't really have a date
    type as such - but instead has date manipulation functions for
    strings.

    Take a look at
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions


    Jacob Rideout


    --
    Alexey Kovyrin
    http://kovyrin.net/
  • Alexey Kovyrin at Aug 20, 2010 at 4:47 pm
    Who told you those can't be used in where clauses? Here is a sample
    query I did on one of our servers right now:

    ------------------------------------------------------------------------------------
    # hive
    Hive history file=/tmp/root/hive_job_log_root_201008201144_357004789.txt
    hive> select * from doc_this_week_views where object_id = 1 OR
    object_id = 33054283;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201007222311_3438, Tracking URL =
    http://dfs01.local:50030/jobdetails.jsp?jobid=job_201007222311_3438
    Kill Command = /usr/lib/hadoop/bin/hadoop job
    -Dmapred.job.tracker=dfs01.local:9001 -kill job_201007222311_3438
    2010-08-20 11:44:51,874 Stage-1 map = 0%, reduce = 0%
    2010-08-20 11:45:03,962 Stage-1 map = 100%, reduce = 0%
    2010-08-20 11:45:06,990 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201007222311_3438
    OK

    33054283 10
    1 87609

    Time taken: 18.468 seconds
    ------------------------------------------------------------------------------------

    Apparently it works as expected.
    On Fri, Aug 20, 2010 at 11:58 AM, Ronak Bhatt wrote:
    This OR and AND are for use in select or condition..for example,
    select (a OR B) as end_result_trur_or_false
    from table
    where id > 20;
    What I'm looking for is,
    select sum(amount)
    from table
    where id > 20 and
    (a > 5  OR b < 10)
    as you can see, I want to use OR in where condition...I could possibly use
    UNION ALL to simulate the OR condition, but it is lot of writing
    (duplicating the code) and also not efficient from performance perspective
    as the table will be scanned multiple times for each select of the union...
    hope this helps to clarify what kind of OR I'm looking for....


    thanks, ronak
    408 504 4847
    My Blog : http://ronakbaps.posterous.com



    On Fri, Aug 20, 2010 at 8:50 AM, Alexey Kovyrin wrote:

    It definitely does support it.
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Logical_Operators
    On Fri, Aug 20, 2010 at 11:37 AM, Ronak Bhatt wrote:
    I read somewhere that hive does NOT support OR condition....that's what
    I
    was wondering as to how others handle the situation.



    On Fri, Aug 20, 2010 at 8:30 AM, Jacob R Rideout
    <apache@jacobrideout.net>
    wrote:
    On Fri, Aug 20, 2010 at 8:37 AM, Ronak Bhatt <ronakbaps@gmail.com>
    wrote:
    Hi - I've a following SQL query. What is the way to convert it into
    HIVE
    runnable format -
    Select a.id, count(b.url)
    from a, b
    where a.id = b.id
    and (b.url  like '%google.com%' or b.url like '%h.google.com%' or
    b.url
    like
    '%bing%')
    and a.exec_date = date(now()) ;  ----- get today's data
    group by a.id

    thanks, ronak
    I think something like:

    SELECT a.id, COUNT(distinct b.url)
    FROM a
    JOIN b on a.id = b.id
    WHERE
    (b.url  LIKE '%google.com%' OR b.url LIKE '%h.google.com%' OR
    b.url LIKE '%bing%')
    AND a.exec_date = from_unixtime(unix_timestamp())
    GROUP BY a.id;

    should work, but I didn't test it. You will probably need to format
    the data string to match your data - hive doesn't really have a date
    type as such - but instead has date manipulation functions for
    strings.

    Take a look at
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions


    Jacob Rideout


    --
    Alexey Kovyrin
    http://kovyrin.net/


    --
    Alexey Kovyrin
    http://kovyrin.net/
  • Ronak Bhatt at Aug 20, 2010 at 5:03 pm
    Here is an example of what error I'm referring to.... I'm running HIVE in
    local mode by saying "SET mapred.job.tracker=local;"

    any pointers to address the problem would be greatly appreciated....


    hive> select substr(CB.EXEC_DATE,1,10), count(CB.ID)
    from callbacks CB JOIN pages p ON
    (
    CB.page_id = p.id
    and (p.page like '%google.com/search%'
    or p.page like '%google.com/custom%'
    or p.page like '%google.com/#hl%'
    or p.page like '%google.com/cse%'
    or p.page like '%search.yahoo.com/search%'
    or p.page like '%bing.com/search%'
    or p.page like '%google.com/product%' )
    )
    group by substr(CB.EXEC_DATE,1,10);
    *FAILED: Error in semantic analysis: line 5:5 OR not supported in Join
    currently '%google.com/product%'*



    *thanks, ronak*
    *
    *
    *408 504 4847*
    *My Blog : http://ronakbaps.posterous.com*
    *
    *
    *
    *


    On Fri, Aug 20, 2010 at 9:46 AM, Alexey Kovyrin wrote:

    Who told you those can't be used in where clauses? Here is a sample
    query I did on one of our servers right now:


    ------------------------------------------------------------------------------------
    # hive
    Hive history file=/tmp/root/hive_job_log_root_201008201144_357004789.txt
    hive> select * from doc_this_week_views where object_id = 1 OR
    object_id = 33054283;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201007222311_3438, Tracking URL =
    http://dfs01.local:50030/jobdetails.jsp?jobid=job_201007222311_3438
    Kill Command = /usr/lib/hadoop/bin/hadoop job
    -Dmapred.job.tracker=dfs01.local:9001 -kill job_201007222311_3438
    2010-08-20 11:44:51,874 Stage-1 map = 0%, reduce = 0%
    2010-08-20 11:45:03,962 Stage-1 map = 100%, reduce = 0%
    2010-08-20 11:45:06,990 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201007222311_3438
    OK

    33054283 10
    1 87609

    Time taken: 18.468 seconds

    ------------------------------------------------------------------------------------

    Apparently it works as expected.
    On Fri, Aug 20, 2010 at 11:58 AM, Ronak Bhatt wrote:
    This OR and AND are for use in select or condition..for example,
    select (a OR B) as end_result_trur_or_false
    from table
    where id > 20;
    What I'm looking for is,
    select sum(amount)
    from table
    where id > 20 and
    (a > 5 OR b < 10)
    as you can see, I want to use OR in where condition...I could possibly use
    UNION ALL to simulate the OR condition, but it is lot of writing
    (duplicating the code) and also not efficient from performance
    perspective
    as the table will be scanned multiple times for each select of the union...
    hope this helps to clarify what kind of OR I'm looking for....


    thanks, ronak
    408 504 4847
    My Blog : http://ronakbaps.posterous.com



    On Fri, Aug 20, 2010 at 8:50 AM, Alexey Kovyrin wrote:

    It definitely does support it.
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Logical_Operators
    On Fri, Aug 20, 2010 at 11:37 AM, Ronak Bhatt wrote:
    I read somewhere that hive does NOT support OR condition....that's
    what
    I
    was wondering as to how others handle the situation.



    On Fri, Aug 20, 2010 at 8:30 AM, Jacob R Rideout
    <apache@jacobrideout.net>
    wrote:
    On Fri, Aug 20, 2010 at 8:37 AM, Ronak Bhatt <ronakbaps@gmail.com>
    wrote:
    Hi - I've a following SQL query. What is the way to convert it into
    HIVE
    runnable format -
    Select a.id, count(b.url)
    from a, b
    where a.id = b.id
    and (b.url like '%google.com%' or b.url like '%h.google.com%' or
    b.url
    like
    '%bing%')
    and a.exec_date = date(now()) ; ----- get today's data
    group by a.id

    thanks, ronak
    I think something like:

    SELECT a.id, COUNT(distinct b.url)
    FROM a
    JOIN b on a.id = b.id
    WHERE
    (b.url LIKE '%google.com%' OR b.url LIKE '%h.google.com%' OR
    b.url LIKE '%bing%')
    AND a.exec_date = from_unixtime(unix_timestamp())
    GROUP BY a.id;

    should work, but I didn't test it. You will probably need to format
    the data string to match your data - hive doesn't really have a date
    type as such - but instead has date manipulation functions for
    strings.

    Take a look at
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions


    Jacob Rideout


    --
    Alexey Kovyrin
    http://kovyrin.net/


    --
    Alexey Kovyrin
    http://kovyrin.net/
  • Namit Jain at Aug 20, 2010 at 5:29 pm
    Currently, only equality joins are supported.
    But, you can rewrite your query as:


    select substr(CB.EXEC_DATE,1,10), count(CB.ID<http://CB.ID>)
    from callbacks CB JOIN
    (select * from pages p where
    p.page like '%google.com/search%<http://google.com/search%>'
    or p.page like '%google.com/custom%<http://google.com/custom%>'
    or p.page like '%google.com/#hl%<http://google.com/#hl%>'
    or p.page like '%google.com/cse%<http://google.com/cse%>'
    or p.page like '%search.yahoo.com/search%<http://search.yahoo.com/search%>'
    or p.page like '%bing.com/search%<http://bing.com/search%>'
    or p.page like '%google.com/product%<http://google.com/product%>' ) s
    ON s.id
    group by substr(CB.EXEC_DATE,1,10);


    ________________________________________
    From: Ronak Bhatt [ronakbaps@gmail.com]
    Sent: Friday, August 20, 2010 10:02 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: question - how to handle OR with HIVe

    Here is an example of what error I'm referring to.... I'm running HIVE in local mode by saying "SET mapred.job.tracker=local;"

    any pointers to address the problem would be greatly appreciated....


    hive> select substr(CB.EXEC_DATE,1,10), count(CB.ID<http://CB.ID>)
    from callbacks CB JOIN pages p ON
    (
    CB.page_id = p.id<http://p.id>
    and (p.page like '%google.com/search%<http://google.com/search%>'
    or p.page like '%google.com/custom%<http://google.com/custom%>'
    or p.page like '%google.com/#hl%<http://google.com/#hl%>'
    or p.page like '%google.com/cse%<http://google.com/cse%>'
    or p.page like '%search.yahoo.com/search%<http://search.yahoo.com/search%>'
    or p.page like '%bing.com/search%<http://bing.com/search%>'
    or p.page like '%google.com/product%<http://google.com/product%>' )
    )
    group by substr(CB.EXEC_DATE,1,10);
    FAILED: Error in semantic analysis: line 5:5 OR not supported in Join currently '%google.com/product%<http://google.com/product%>'



    thanks, ronak

    408 504 4847
    My Blog : http://ronakbaps.posterous.com





    On Fri, Aug 20, 2010 at 9:46 AM, Alexey Kovyrin wrote:
    Who told you those can't be used in where clauses? Here is a sample
    query I did on one of our servers right now:

    ------------------------------------------------------------------------------------
    # hive
    Hive history file=/tmp/root/hive_job_log_root_201008201144_357004789.txt
    hive> select * from doc_this_week_views where object_id = 1 OR
    object_id = 33054283;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201007222311_3438, Tracking URL =
    http://dfs01.local:50030/jobdetails.jsp?jobid=job_201007222311_3438
    Kill Command = /usr/lib/hadoop/bin/hadoop job
    -Dmapred.job.tracker=dfs01.local:9001 -kill job_201007222311_3438
    2010-08-20 11:44:51,874 Stage-1 map = 0%, reduce = 0%
    2010-08-20 11:45:03,962 Stage-1 map = 100%, reduce = 0%
    2010-08-20 11:45:06,990 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201007222311_3438
    OK

    33054283 10
    1 87609

    Time taken: 18.468 seconds
    ------------------------------------------------------------------------------------

    Apparently it works as expected.
    On Fri, Aug 20, 2010 at 11:58 AM, Ronak Bhatt wrote:
    This OR and AND are for use in select or condition..for example,
    select (a OR B) as end_result_trur_or_false
    from table
    where id > 20;
    What I'm looking for is,
    select sum(amount)
    from table
    where id > 20 and
    (a > 5 OR b < 10)
    as you can see, I want to use OR in where condition...I could possibly use
    UNION ALL to simulate the OR condition, but it is lot of writing
    (duplicating the code) and also not efficient from performance perspective
    as the table will be scanned multiple times for each select of the union...
    hope this helps to clarify what kind of OR I'm looking for....


    thanks, ronak
    408 504 4847
    My Blog : http://ronakbaps.posterous.com



    On Fri, Aug 20, 2010 at 8:50 AM, Alexey Kovyrin wrote:

    It definitely does support it.
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Logical_Operators
    On Fri, Aug 20, 2010 at 11:37 AM, Ronak Bhatt wrote:
    I read somewhere that hive does NOT support OR condition....that's what
    I
    was wondering as to how others handle the situation.



    On Fri, Aug 20, 2010 at 8:30 AM, Jacob R Rideout
    <apache@jacobrideout.net > > wrote:
    On Fri, Aug 20, 2010 at 8:37 AM, Ronak Bhatt <ronakbaps@gmail.com > >> wrote:
    Hi - I've a following SQL query. What is the way to convert it into
    HIVE
    runnable format -
    Select a.id<http://a.id>, count(b.url)
    from a, b
    where a.id<http://a.id> = b.id<http://b.id>
    and (b.url like '%google.com<http://google.com>%' or b.url like '%h.google.com<http://h.google.com>%' or
    b.url
    like
    '%bing%')
    and a.exec_date = date(now()) ; ----- get today's data
    group by a.id<http://a.id>

    thanks, ronak
    I think something like:

    SELECT a.id<http://a.id>, COUNT(distinct b.url)
    FROM a
    JOIN b on a.id<http://a.id> = b.id<http://b.id>
    WHERE
    (b.url LIKE '%google.com<http://google.com>%' OR b.url LIKE '%h.google.com<http://h.google.com>%' OR
    b.url LIKE '%bing%')
    AND a.exec_date = from_unixtime(unix_timestamp())
    GROUP BY a.id<http://a.id>;

    should work, but I didn't test it. You will probably need to format
    the data string to match your data - hive doesn't really have a date
    type as such - but instead has date manipulation functions for
    strings.

    Take a look at
    http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions


    Jacob Rideout


    --
    Alexey Kovyrin
    http://kovyrin.net/


    --
    Alexey Kovyrin
    http://kovyrin.net/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedAug 20, '10 at 2:38p
activeAug 20, '10 at 5:29p
posts8
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase