Grokbase Groups Hive user April 2010
FAQ
I have a search solution that is down stream of some Netezza data marts that
I'm replacing with a Hive solution. We already partition the data for the
search solution 32 ways and I would like to take advantage of the data
clustering in Hive (buckets), so that I don't have to do any post
processing. Is there documentation that describes how the data is hashed or
how it's organized across the buckets? Or could someone point me to a class
that implements it? Thanks!

Aaron

Search Discussions

  • Zheng Shao at Apr 11, 2010 at 9:17 pm
    Its as simple as taking a hashcode of the key and mod by number of
    reducers. To get started, have a try of any .q files in clientpositive
    directory.

    On the code side, HiveKey.java has the implementation.



    Sent from my iPhone
    On Apr 11, 2010, at 2:48 PM, Aaron McCurry wrote:

    I have a search solution that is down stream of some Netezza data
    marts that I'm replacing with a Hive solution. We already partition
    the data for the search solution 32 ways and I would like to take
    advantage of the data clustering in Hive (buckets), so that I don't
    have to do any post processing. Is there documentation that
    describes how the data is hashed or how it's organized across the
    buckets? Or could someone point me to a class that implements it?
    Thanks!

    Aaron
  • Aaron McCurry at Apr 11, 2010 at 9:21 pm
    Thanks a lot! I figured it was that simple.

    Aaron
    On Sun, Apr 11, 2010 at 5:16 PM, Zheng Shao wrote:

    Its as simple as taking a hashcode of the key and mod by number of
    reducers. To get started, have a try of any .q files in clientpositive
    directory.

    On the code side, HiveKey.java has the implementation.



    Sent from my iPhone


    On Apr 11, 2010, at 2:48 PM, Aaron McCurry wrote:

    I have a search solution that is down stream of some Netezza data marts
    that I'm replacing with a Hive solution. We already partition the data for
    the search solution 32 ways and I would like to take advantage of the data
    clustering in Hive (buckets), so that I don't have to do any post
    processing. Is there documentation that describes how the data is hashed or
    how it's organized across the buckets? Or could someone point me to a class
    that implements it? Thanks!

    Aaron

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedApr 11, '10 at 7:49p
activeApr 11, '10 at 9:21p
posts3
users2
websitehive.apache.org

2 users in discussion

Aaron McCurry: 2 posts Zheng Shao: 1 post

People

Translate

site design / logo © 2021 Grokbase