FAQ
I have two tables:
pages( title, domain, url )
top_domains(domain)

top_domains was created from a group by domain operation on the pages table.

Because the pages table is very large, I only want to be able to
sample 5 rows for each domain in top_domains.

in a traditional programming language, i could just use a for loop to
iterate on the domain field and perform a select with a limit 5
clause.

Is there a way to express this query in hive?


-
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com

Search Discussions

  • Guru Prasad at Sep 28, 2010 at 6:14 am
    Hi,
    Please see the attachment.......this might help you.
    It helped me for solving similar kind of problem.


    Thanks & Regards
    ~guru prasad
    On 09/28/2010 06:20 AM, Tommy Chheng wrote:
    I have two tables:
    pages( title, domain, url )
    top_domains(domain)

    top_domains was created from a group by domain operation on the pages table.


    Because the pages table is very large, I only want to be able to sample 5 rows for each domain in top_domains.

    in a traditional programming language, i could just use a for loop to iterate on the domain field and perform a select with a limit 5 clause.
    Is there a way to express this query in hive?
    -
    @tommychheng
    Programmer and UC Irvine Graduate Student
    Find a great grad school based on research interests:http://gradschoolnow.com

    This message is intended only for the use of the addressee and may contain information that is privileged, confidential
    and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the
    employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any
    dissemination, distribution or copying of this communication is strictly prohibited. If you have received this e-mail
    in error, please notify us immediately by return e-mail and delete this e-mail and all attachments from your system.
  • Tommy Chheng at Oct 1, 2010 at 6:21 pm
    Thanks, I ended up writing a scala program which uses the hive JDBC
    connector. Performance was still reasonable.

    @tommychheng
    Programmer and UC Irvine Graduate Student
    Find a great grad school based on research interests: http://gradschoolnow.com

    On 9/27/10 11:13 PM, Guru Prasad wrote:
    Hi,
    Please see the attachment.......this might help you.
    It helped me for solving similar kind of problem.


    Thanks & Regards
    ~guru prasad
    On 09/28/2010 06:20 AM, Tommy Chheng wrote:
    I have two tables:
    pages( title, domain, url )
    top_domains(domain)

    top_domains was created from a group by domain operation on the pages table.


    Because the pages table is very large, I only want to be able to sample 5 rows for each domain in top_domains.

    in a traditional programming language, i could just use a for loop to iterate on the domain field and perform a select with a limit 5 clause.
    Is there a way to express this query in hive?
    -
    @tommychheng
    Programmer and UC Irvine Graduate Student
    Find a great grad school based on research interests:http://gradschoolnow.com

    This message is intended only for the use of the addressee and may contain information that is privileged, confidential
    and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the
    employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any
    dissemination, distribution or copying of this communication is strictly prohibited. If you have received this e-mail
    in error, please notify us immediately by return e-mail and delete this e-mail and all attachments from your system.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 28, '10 at 12:51a
activeOct 1, '10 at 6:21p
posts3
users2
websitehive.apache.org

2 users in discussion

Tommy Chheng: 2 posts Guru Prasad: 1 post

People

Translate

site design / logo © 2021 Grokbase